# Import Qwen3 models to Amazon Bedrock (via local download)

This notebook demonstrates how to deploy custom (fine-tuned) Qwen 3 models to Amazon Bedrock for serverless inference, **downloading a full copy** of the model from [Hugging Face Hub](https://huggingface.co/models) to the notebook during the import.

> ℹ️ **Alternatively**, check out [CMI-Qwen3-HF-StreamCopy.ipynb](CMI-Qwen3-HF-StreamCopy.ipynb) for an example that requires much less local storage.
>
> If your Qwen model is already on Amazon S3, you can skip the download/upload steps and either notebook should work fine.

You can run this example on modest compute (for example the default 2vCPU, 4GB RAM `ml.t3.medium` instance type on SageMaker), although using instance types with higher network performance (like the c7i family) may help to speed up model copy processes. See our example run times listed with "⏰" in each section below, and consider specifying your Hugging Face API key for faster downloads.

You'll also **need to configure notebook storage** to accommodate your target model as described in the ["Notebook infrastructure" section of the import_models README.md](../README.md#Notebook-infrastructure).
- For example, [Qwen3-30B-A3B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507/tree/main) is ~61GB, so we'd suggest to provision at least 67GB to allow for the sample repo and other working files.
- ⚠️ Note the **limits on maximum importable model size** in the [Bedrock User Guide](https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization-import-model.html#model-customization-import-model-architecture). At the time of writing, the 470GB [Qwen3-235B-A22B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507/tree/main) model exceeds these limits so cannot be imported.

## Prerequisites
---

First, we'll walk through setting up:
1. An Amazon S3 Bucket in the region where you want to deploy your model
2. Sufficient access/permissions to your target AWS account

### S3 Bucket
---

**For quickest setup**, you can avoid manual work here by running the notebook in a SageMaker AI environment (e.g. a [Notebook Instance](https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html) or [SageMaker AI Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated.html) JupyterLab Space) and letting it create a default bucket for you - which will be named `sagemaker-{AWS_REGION}-{AWS_ACCOUNT_ID}`.

...But to keep your environment organized, you might prefer to [create a bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/create-bucket-overview.html) in your target AWS account and region - or use an existing one.

### AWS permissions
---

Refer to the ["AWS permissions" section of import_models/README.md](../README.md#AWS-permissions), which contains example policies that should work for this notebook:
- Your CMI Job role needs to be assumable by Bedrock, and provide `s3:GetObject` and `s3:ListBucket` permissions to read your custom model from your S3 Bucket.
- Your notebook environment needs permissions to 1/ Upload your model to your S3 bucket, 2/ Manage Bedrock Custom Model Import jobs, 3/ View (and optionally for clean-up, delete) imported Custom Models, 4/ [Pass](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_passrole.html) your CMI Job role to Bedrock, and 5/ Invoke the imported model to check it's working.

## Configuration
---

With the prerequisites in place and this notebook up and running on appropriate infrastructure, we're ready to dive in to the code.

First, you'll need some Python libraries:
- `boto3`, the AWS SDK for Python.
- `huggingface_hub` is **optional** if you've already saved your model to local disk or Amazon S3
- `sagemaker` (the high-level SDK for Amazon SageMaker) is **optional** if:
    1. You know the ARN of the IAM Role you want to use for the Bedrock CMI job, **and**
    2. You already have an S3 bucket to store your model, and 

In [None]:
%pip install boto3 "huggingface_hub>=1.2,<2" sagemaker

Next, import the main libraries and set up the Python clients for the AWS services we'll be using:

In [None]:
# Python Built-Ins:
import json
import os
import time

# External Libraries:
import boto3

bedrock = boto3.client("bedrock")  # Bedrock control plane for managing imports
bedrock_runtime = boto3.client("bedrock-runtime")  # Bedrock runtime for invoking models
s3 = boto3.client("s3")  # S3 for uploading model artifacts

With your Python environment set up, it's time to configure some parameters:

In [None]:
# Your custom model's location on Hugging Face Hub (ignore if you're not importing from HF):
HF_REPO_ID = "Qwen/Qwen3-30B-A3B-Instruct-2507"

# Job name and model name for importing your model to Bedrock:
# (Note these must both be unique within your AWS Account and Region, so you'll need to change them
# if re-running the import)
CMI_JOB_NAME = "import-Qwen3"
IMPORTED_MODEL_NAME = "Qwen3-imported"

If you're running this notebook in SageMaker, happy to use the default SageMaker S3 Bucket, using a single execution Role for both the notebook and the CMI Job - then the following parameters can be configured for you automatically.

> ⚠️ Otherwise, configure them manually (and you can remove the `sagemaker` dependency)

In [None]:
import sagemaker

# IAM Role ARN for the Bedrock CMI Job:
CMI_ROLE = sagemaker.get_execution_role()  # Use current SM notebook execution as CMI Job role
print("IAM role for Bedrock CMI job:", CMI_ROLE)

# S3 Bucket name and folder prefix to upload your model to for import:
S3_BUCKET = sagemaker.Session().default_bucket()
S3_PREFIX = "bedrock-custom-models/qwen3"

## Download model artifacts from Hugging Face
---

> ℹ️ If your model is already available locally and doesn't need fetching from Hugging Face, you can skip this step and instead set `local_folder = "{your-model-folder}"`.

The [Hugging Face Hub client library](https://huggingface.co/docs/huggingface_hub/v1.3.1/en/index) manages downloads via a local cache directory to avoid unnecessary repeat downloads.

To pull very large models, we'll first need to ensure this cache folder is created inside the large storage volume we provisioned for the notebook (which is mounted at `/home/ec2-user/SageMaker` in SageMaker Notebook Instances or `/home/sagemaker-user` in SageMaker AI Studio).

> For more information on configuration, see ["Understand caching"](https://huggingface.co/docs/huggingface_hub/v1.3.1/en/guides/manage-cache) in the Hugging Face docs.

In [None]:
# Put all temporary + cache files on the large storage volume:
os.environ["HF_HOME"] = os.path.abspath("cache/hf")
os.environ["HF_HUB_CACHE"] = os.path.abspath("cache/hf")
os.environ["HF_DATASETS_CACHE"] = os.path.abspath("cache/hf")
os.environ["TMPDIR"] = os.path.abspath("cache/tmp")

# If you face failures or flakiness with CAS or Xet downloads (sometimes observed due to network
# restrictions/firewalls, corporate proxies, older SSL stacks, etc), you could also set the below
# variables *BEFORE* importing huggingface_hub (restart your notebook kernel if already imported):
#    os.environ["HF_HUB_ENABLE_HFS_CAS"] = "0"
#    os.environ["HF_HUB_DISABLE_XET"] = "1"

# Create the temp/cache folders if they don't exist already:
os.makedirs(os.environ["HF_HOME"], exist_ok=True)
os.makedirs(os.environ["TMPDIR"], exist_ok=True)

Now we're ready to download the artifacts from Hugging Face to JupyterLab's local file system.

Note that we won't bother to specify a separate `local_dir` here - we'll just read the model directly from the HF cache folder later.

> ⏰ This step may take significant time to complete, depending on the size of your model and the network connection speed from the Hugging Face Hub to the environment where you're running the notebook.
>
> For example in tests with an `ml.t3.medium` notebook in us-east-1 we saw ~8min for a 61GB model. Your results may vary.

In [None]:
%%time

from huggingface_hub import snapshot_download

print(f"Fetching model {HF_REPO_ID}")
local_folder = snapshot_download(repo_id=HF_REPO_ID)
print(f"Downloaded to:\n{local_folder}")

## Upload model artifacts to S3
---
Now the model artifacts are available locally, we can upload them to Amazon S3 ready for Bedrock import.

In the cell below we've implemented a Python function to perform the upload, but you could also use the AWS CLI if preferred. For example: `aws s3 sync {local_folder} s3://{S3_BUCKET}/{S3_PREFIX}`

> ⏰ This step may also take a while for larger models - potentially longer than the download.
>
> In tests with an `ml.t3.medium` notebook in us-east-1, we saw ~11min for a 61GB model. Your results may vary.

In [None]:
%%time

def upload_directory_to_s3(local_dir, bucket, prefix):
    """Recursively upload a local directory to S3 under the specified prefix.

    Preserves relative folder structure exactly.
    """
    local_dir = os.path.abspath(local_dir)
    for root, dirs, files in os.walk(local_dir):
        for filename in sorted(files):
            local_path = os.path.join(root, filename)
            relative_path = os.path.relpath(local_path, local_dir)
            s3_key = f"{prefix}/{relative_path}"

            print(f"Uploading: {local_path} → s3://{bucket}/{s3_key}")
            s3.upload_file(local_path, bucket, s3_key)

    print("Upload complete.")


upload_directory_to_s3(local_dir=local_folder, bucket=S3_BUCKET, prefix=S3_PREFIX)

## Import to Bedrock
---

Once the model artifacts are on S3, we're ready to start a CMI job to prepare a Bedrock model ready to serve requests.

> ⏰ CMI jobs may take a few minutes, depending on the size of the model artifacts. As an indication, for a Qwen3 30B model (61GB model artifacts) we saw ~15min in one test.

**Troubleshooting errors:**

If your import job fails, you can find more information by looking at the "Imported Models" -> "Jobs" tab of the [Amazon Bedrock Console](https://console.aws.amazon.com/bedrock/home?#/import-models). Click on the failed job and you should see the failure reason with more information.

- One possible cause is incorrect IAM permissions - in which case the failure reason should indicate which part of the IAM policy you need to fix.
- You may also receive a quota exceeded exception if you have too many imported models in your AWS Account+region, or start too many model import jobs at the same time. Refer to the [Service Quotas Console](https://console.aws.amazon.com/servicequotas/home/services/bedrock/quotas) to check your current quotas and request increases where available.

In [None]:
%%time

s3_uri = f"s3://{S3_BUCKET}/{S3_PREFIX}/"

# Create the model import job
response = bedrock.create_model_import_job(
    jobName=CMI_JOB_NAME,
    importedModelName=IMPORTED_MODEL_NAME,
    roleArn=CMI_ROLE,
    modelDataSource={
        "s3DataSource": {"s3Uri": s3_uri},
    }
)

job_Arn = response["jobArn"]

# Output the job ARN
print(f"Model import job created with ARN: {response['jobArn']}")

# Check CMI job status
while True:
    response = bedrock.get_model_import_job(jobIdentifier=job_Arn)
    status = response['status'].upper()
    print(f"Job status: {status}")

    if status == "FAILED":
        raise RuntimeError(
            "Model import job failed - see the Bedrock console for more details"
        )

    if status == "COMPLETED":
        # Get the model ID
        model_arn = response["importedModelArn"]
        print(f"\n\n✅ Model imported with ID:\n{model_arn}")
        break

    time.sleep(30)  # Check every 30 seconds

## Model information (including pricing)
---

In addition to the ["Imported Models" page of the Amazon Bedrock Console](https://us-east-1.console.aws.amazon.com/bedrock/home?region=us-east-1#/import-models), you can use the [GetImportedModel API](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_GetImportedModel.html) to look up useful information about your imported model, including for example:
- `customModelUnits.customModelUnitsPerModelCopy`: The number of Custom Model Units which will be consumed per running copy of your model
    - This determines the pricing of your imported model - per the [Bedrock pricing page](https://aws.amazon.com/bedrock/pricing/)
    - For example, if a model requires 4 CMU per copy then (using the listed rates for us-east-1 region at the time of writing), it would cost 4⨉1.95=\\$7.80/month for storage, plus 4⨉0.5718≈\\$0.23/minute per active copy running inference
    - Note the Bedrock `ModelCopy` metric in Amazon CloudWatch tracks the number of deployed copies of your imported model over time, which auto-scales based on requests
- `instructSupported`: Whether or not this particular imported model supports the Bedrock Converse API
- `modelDataSource`: The (Amazon S3) location your model was imported from

In [None]:
get_model_resp = bedrock.get_imported_model(modelIdentifier=model_arn)

# Render the JSON response nicely:
print(json.dumps(get_model_resp, indent=2, default=str))

## Test inference with the imported model
---

When the model import job is completed, you're ready to use your Qwen model for inference.

⚠️ First though, some important points to be aware of (correct at the time of writing in January 2026):

1. **Bedrock's [Converse API](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Converse.html) is not supported** for imported Qwen3 models - only the [InvokeModel API](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) or the streaming [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html).
2. **Auto-scaling and Cold start `ModelNotReadyException` errors:**
    - Bedrock auto-scales capacity for your imported model based on usage, and you're charged for this capacity in "Custom Model Units" (see [CMI documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/import-model-calculate-cost.html) and the [Bedrock pricing page](https://aws.amazon.com/bedrock/pricing/) for more details).
    - When your model is first imported, or hasn't been invoked at all for some time, this capacity will scale down to zero.
    - The first request made to a "cold" model with zero capacity will return a `ModelNotReadyException` and start re-provisioning your model in the background.
    - Once your capacity is ready (which may take a few minutes), subsequent requests will be successful.
    - As a result, you probably want to configure solutions using imported Bedrock models to retry after some time on receiving this error. For use-cases where traffic is low but improving response time is a higher priority than saving cost, you could also consider automating regular "keep-alive" requests to keep at least some capacity available.

In [None]:
def invoke_textgen_model(request: dict, model_id: str) -> str | None:
    """Invoke model with a JSON payload and return text output"""
    response = bedrock_runtime.invoke_model(
        modelId=model_id,
        body=json.dumps(request).encode("utf-8"),
        accept="application/json",
        contentType="application/json"
    )

    response_body = json.loads(response["body"].read())

    # robust field selection
    output_text = (
        response_body.get("generation")
        or response_body.get("output_text")
        or response_body.get("generated_text")
        or response_body.get("text")
    )
    return output_text

Remember, you'll likely receive a `ModelNotReadyException` error the first time you run the following cell, as discussed above. Try the following request a few times until your model is ready to respond:

In [None]:
prompt = "What is the meaning of life?"

request = {
    "prompt": prompt,
    "max_gen_len": 1000,
    "top_p": 0.9,
    "temperature": 0.2,
}

reply = invoke_textgen_model(request, model_arn)
print(reply)

## Clean-up

Your imported model will automatically scale down to zero *active* Custom Model Units when not in use, but there are ongoing storage and other costs you may like to avoid when you're done experimenting.

> ℹ️ We've commented-out the code cells in this section to avoid you accidentally losing work if clicking "Run all cells" on the notebook. If you do want to run them, you can select the text and type `Ctrl` + `/` to un-comment.

First, since Bedrock itself stores a copy of your imported model, you can delete your artifacts from Amazon S3 without affecting your deployed endpoint:

> ⚠️ Do note though, that your model and import job will continue to list the original S3 location in their metadata.
>
> S3 storage costs for model artifacts are usually significantly lower than Bedrock imported model storage costs.

In [None]:
# uri_to_delete = f"s3://{S3_BUCKET}/{S3_PREFIX}"
# print(f"(Interrupt this cell to cancel...)\nAbout to delete {uri_to_delete}")
# time.sleep(5)
# !aws s3 rm --recursive {uri_to_delete}

Second, you can delete your imported model from Bedrock to avoid the ongoing Bedrock model storage costs. If you do this, your model will no longer be available to invoke until re-imported:

In [None]:
# print(f"Deleting imported model {model_arn}")
# bedrock.delete_imported_model(modelIdentifier=model_arn)

## Conclusion

Here we showed the end-to-end process of copying a custom Qwen 3 model from Hugging Face Hub to Amazon S3; importing it from there to Amazon Bedrock with Custom Model Import (CMI); and invoking the imported model for inference.

The same process shown here applies to other Qwen 3 variants. For more information about Custom Model Import and its features, refer to the [Amazon Bedrock User Guide](https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization-import-model.html).