# Deploy Evo2 NIM from AWS Marketplace

Evo 2 is a biological foundation model that can interpret and generate DNA sequences across various biological scales —from individual molecules to entire genomes —while retaining sensitivity to single-nucleotide changes, enabling zero-shot predictions and complex biological system designs.

In this example, we show how to deploy the Evo 2 NIM model from AWS Marketplace.

Evo 2’s potential applications span from accelerating drug discovery to advancing synthetic biology. Evo 2 model was trained by the Arc Institute. The model’s training involved a vast dataset of genomes, enabling Evo 2 to perform a wide range of tasks, from predicting the impact of mutations on protein function to generating complex molecular systems such as CRISPR-Cas complexes. For example, the model demonstrated its ability to design new versions of the CRISPR genome editor, showcasing its potential to create novel biological tools.

In general, NIMs offer an easy-to-deploy and straightforward route for self-hosted AI applications. Two significant advantages that NIMs offer for system administrators and developers are:

- Increased productivity: NIMs enable developers to build generative AI applications quickly, in minutes rather than weeks, by providing a standardized way to add AI capabilities to their applications.

- Simplified deployment: NIMs provide containers that can be easily deployed on various platforms, including clouds, data centers, or workstations, making it convenient for developers to test and deploy their applications.

Please check out the [Evo2 NIM docs](https://docs.nvidia.com/nim/bionemo/evo2/2.1.0/overview.html) and [NIM LLM docs](https://docs.nvidia.com/nim/large-language-models/latest/introduction.html) for more information.

## ⚠️ Disclaimer

The NIM model provides endpoints that generate DNA sequences, run the model forward pass and save layer outputs, and conduct readiness checks. Visit [**Evo2 NIM endpoints**](https://docs.nvidia.com/nim/bionemo/evo2/2.1.0/endpoints.html) for more details. This notebook shows examples for both the **generate** and the **forward** endpoints.

## Pre-requisites:
1. **Note**: This notebook contains elements which render correctly in Jupyter interface. Open this notebook from an Amazon SageMaker Notebook Instance or Amazon SageMaker Studio.
1. Ensure that IAM role used has **AmazonSageMakerFullAccess**
1. To deploy this ML model successfully, ensure that:
    1. Either your IAM role has these three permissions and you have authority to make AWS Marketplace subscriptions in the AWS account used: 
        1. **aws-marketplace:ViewSubscriptions**
        1. **aws-marketplace:Unsubscribe**
        1. **aws-marketplace:Subscribe**  
    2. or your AWS account has a subscription to one of the models listed above.


## Subscribe to the model package
To subscribe to the model package:
1. Open the model package listing page
1. On the AWS Marketplace listing, click on the **Continue to subscribe** button.
1. On the **Subscribe to this software** page, review and click on **"Accept Offer"** if you and your organization agrees with EULA, pricing, and support terms. 
1. Once you click on **Continue to configuration button** and then choose a **region**, you will see a **Product Arn** displayed. This is the model package ARN that you need to specify while creating a deployable model. Copy the ARN corresponding to your region and specify the same in the following cell.

In [1]:
import boto3, json, sagemaker, time, os
from sagemaker import get_execution_role, ModelPackage
from botocore.config import Config

config = Config(read_timeout=3600)
sess = boto3.Session()
sm = sess.client("sagemaker")
sagemaker_session = sagemaker.Session(boto_session=sess)
role = get_execution_role()
client = boto3.client("sagemaker-runtime", config=config)
region = sess.region_name

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml


In [2]:
# replace the arn below with the model package arn you want to deploy
nim_package = "evo2-40b-2-1-0-3573384a51873302b4086bdb03edf36e"

# Mapping for Model Packages
model_package_map = {
    "us-east-1": f"arn:aws:sagemaker:us-east-1:865070037744:model-package/{nim_package}",
    "us-east-2": f"arn:aws:sagemaker:us-east-2:057799348421:model-package/{nim_package}",
    "us-west-1": f"arn:aws:sagemaker:us-west-1:382657785993:model-package/{nim_package}",
    "us-west-2": f"arn:aws:sagemaker:us-west-2:594846645681:model-package/{nim_package}",
    "ca-central-1": f"arn:aws:sagemaker:ca-central-1:470592106596:model-package/{nim_package}",
    "eu-central-1": f"arn:aws:sagemaker:eu-central-1:446921602837:model-package/{nim_package}",
    "eu-west-1": f"arn:aws:sagemaker:eu-west-1:985815980388:model-package/{nim_package}",
    "eu-west-2": f"arn:aws:sagemaker:eu-west-2:856760150666:model-package/{nim_package}",
    "eu-west-3": f"arn:aws:sagemaker:eu-west-3:843114510376:model-package/{nim_package}",
    "eu-north-1": f"arn:aws:sagemaker:eu-north-1:136758871317:model-package/{nim_package}",
    "ap-southeast-1": f"arn:aws:sagemaker:ap-southeast-1:192199979996:model-package/{nim_package}",
    "ap-southeast-2": f"arn:aws:sagemaker:ap-southeast-2:666831318237:model-package/{nim_package}",
    "ap-northeast-2": f"arn:aws:sagemaker:ap-northeast-2:745090734665:model-package/{nim_package}",
    "ap-northeast-1": f"arn:aws:sagemaker:ap-northeast-1:977537786026:model-package/{nim_package}",
    "ap-south-1": f"arn:aws:sagemaker:ap-south-1:077584701553:model-package/{nim_package}",
    "sa-east-1": f"arn:aws:sagemaker:sa-east-1:270155090741:model-package/{nim_package}",
}

region = boto3.Session().region_name
if region not in model_package_map.keys():
    raise Exception(f"Current boto3 session region {region} is not supported.")

model_package_arn = model_package_map[region]
model_package_arn

'arn:aws:sagemaker:us-east-1:865070037744:model-package/evo2-40b-2-1-0-3573384a51873302b4086bdb03edf36e'

## Create the SageMaker Endpoint

We first define SageMaker model using the specified ModelPackageArn.

In [5]:
# Define the model details
sm_model_name = "evo2-40b-2-1-0-test-1"

# Create the SageMaker model
create_model_response = sm.create_model(
    ModelName=sm_model_name,
    PrimaryContainer={
        'ModelPackageName': model_package_arn
    },
    ExecutionRoleArn=role,
    EnableNetworkIsolation=True
)
print("Model Arn: " + create_model_response["ModelArn"])

Model Arn: arn:aws:sagemaker:us-east-1:492681118881:model/evo2-40b-2-1-0-test-1


Next we create endpoint configuration specifying instance type

In [6]:
# Create the endpoint configuration
endpoint_config_name = sm_model_name

create_endpoint_config_response = sm.create_endpoint_config(
    EndpointConfigName=endpoint_config_name,
    ProductionVariants=[
        {
            'VariantName': 'AllTraffic',
            'ModelName': sm_model_name,
            'InitialInstanceCount': 1,
            'InstanceType': 'ml.g6e.12xlarge', 
            'InferenceAmiVersion': "al2-ami-sagemaker-inference-gpu-3-1",
            'RoutingConfig': {'RoutingStrategy': 'LEAST_OUTSTANDING_REQUESTS'},
            'ModelDataDownloadTimeoutInSeconds': 3600, # Specify the model download timeout in seconds.
            'ContainerStartupHealthCheckTimeoutInSeconds': 3600, # Specify the health checkup timeout in seconds
        }
    ]
)
print("Endpoint Config Arn: " + create_endpoint_config_response["EndpointConfigArn"])

Endpoint Config Arn: arn:aws:sagemaker:us-east-1:492681118881:endpoint-config/evo2-40b-2-1-0-test-1


Using the above endpoint configuration we create a new sagemaker endpoint and wait for the deployment to finish. The status will change to InService once the deployment is successful.

In [7]:
# Create the endpoint
endpoint_name = endpoint_config_name
create_endpoint_response = sm.create_endpoint(
    EndpointName=endpoint_name,
    EndpointConfigName=endpoint_config_name
)

print("Endpoint Arn: " + create_endpoint_response["EndpointArn"])

Endpoint Arn: arn:aws:sagemaker:us-east-1:492681118881:endpoint/evo2-40b-2-1-0-test-1


In [8]:
resp = sm.describe_endpoint(EndpointName=endpoint_name)
status = resp["EndpointStatus"]
print("Status: " + status)

while status == "Creating":
    time.sleep(60)
    resp = sm.describe_endpoint(EndpointName=endpoint_name)
    status = resp["EndpointStatus"]
    print("Status: " + status)

print("Arn: " + resp["EndpointArn"])
print("Status: " + status)

Status: Creating
Status: Creating
Status: Creating
Status: Creating
Status: Creating
Status: Creating
Status: Creating
Status: Creating
Status: InService
Arn: arn:aws:sagemaker:us-east-1:492681118881:endpoint/evo2-40b-2-1-0-test-1
Status: InService


### Run Inference

Once we have the model deployed we can use a sample payload to do an inference request. For inference request format, currently NIM on SageMaker supports the OpenAI API inference protocol. For explanation of supported parameters please see [this link](https://docs.nvidia.com/nim/bionemo/evo2/2.1.0/quickstart-guide.html).

### Evo2 Generate Endpoint

In [10]:
sm_runtime = boto3.client("sagemaker-runtime", region_name=region)

generate_payload = {
    "sequence": "ACGTACGTACGT",
    "num_tokens": 100,
    "temperature": 0.7,
    "top_k": 3,
}

response = sm_runtime.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType="application/json",
    Body=json.dumps(generate_payload),
)

result = json.loads(response["Body"].read())
print("Generated DNA:", result["sequence"])
print("Elapsed (ms):", result.get("elapsed_ms"))

Generated DNA: ACGTACATATGTTCGTACATTCGCACAGACGCCATTTTGAAAAATGCTTTAAATGGATTCAGAATTGGTCAAAATGCATAAATCCATCAAAATTTTTTTC
Elapsed (ms): 10771


### Evo2 Forward Endpoint

In [11]:
forward_payload = {
    "sequence": "ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT",
    "output_layers": [
        "decoder.layers.0.mlp",
        "decoder.layers.10.mlp.linear_fc2",
    ],
}

response = sm_runtime.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType="application/json",
    Body=json.dumps(forward_payload),
    CustomAttributes="route=forward",  # routes through Caddy to /forward
)

forward_result = json.loads(response["Body"].read())
print("Elapsed (ms):", forward_result["elapsed_ms"])
print("NPZ size (bytes):", len(forward_result["data"]))

Elapsed (ms): 1139
NPZ size (bytes): 2676245


In [14]:
import base64, io
import numpy as np

npz_bytes = base64.b64decode(forward_result["data"])
with np.load(io.BytesIO(npz_bytes)) as data:
    print(data.files)                         # ['decoder.layers.0.mlp.output', ...]
    layer_0 = data["decoder.layers.0.mlp.output"]
    layer_10 = data["decoder.layers.10.mlp.linear_fc2.output"]

['decoder.layers.0.mlp.output', 'decoder.layers.10.mlp.linear_fc2.output']


In [15]:
layer_10.shape

(60, 1, 8192)

### Terminate endpoint and clean up artifacts

In [16]:
sm.delete_model(ModelName=sm_model_name)
sm.delete_endpoint_config(EndpointConfigName=endpoint_config_name)
sm.delete_endpoint(EndpointName=endpoint_name)

{'ResponseMetadata': {'RequestId': '528cf376-35b1-44a9-9501-51fe059cf051',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '528cf376-35b1-44a9-9501-51fe059cf051',
   'strict-transport-security': 'max-age=47304000; includeSubDomains',
   'x-frame-options': 'DENY',
   'content-security-policy': "frame-ancestors 'none'",
   'cache-control': 'no-cache, no-store, must-revalidate',
   'x-content-type-options': 'nosniff',
   'content-type': 'application/x-amz-json-1.1',
   'date': 'Sat, 08 Nov 2025 04:11:16 GMT',
   'content-length': '0'},
  'RetryAttempts': 0}}