# Deploy Pre-trained DNABERT Model on Amazon SageMaker

## Installation

*Note:* You will need PyTorch, so make sure, you are using `PyTorch 1.10 Python 3.8 CPU Optimized` kernel and `ml.t3.medium` instance to run this notebook. 

In [None]:
import sys

In [5]:
# update sagemaker version
!{sys.executable} -m pip install -U sagemaker

Collecting sagemaker
  Using cached sagemaker-2.124.0-py2.py3-none-any.whl
Collecting boto3<2.0,>=1.26.28
  Using cached boto3-1.26.32-py3-none-any.whl (132 kB)
Collecting schema
  Using cached schema-0.7.5-py2.py3-none-any.whl (17 kB)
Collecting botocore<1.30.0,>=1.29.32
  Using cached botocore-1.29.32-py3-none-any.whl (10.3 MB)
Collecting s3transfer<0.7.0,>=0.6.0
  Using cached s3transfer-0.6.0-py3-none-any.whl (79 kB)
Collecting contextlib2>=0.5.5
  Using cached contextlib2-21.6.0-py2.py3-none-any.whl (13 kB)
Installing collected packages: contextlib2, schema, botocore, s3transfer, boto3, sagemaker
  Attempting uninstall: botocore
    Found existing installation: botocore 1.24.13
    Uninstalling botocore-1.24.13:
      Successfully uninstalled botocore-1.24.13
  Attempting uninstall: s3transfer
    Found existing installation: s3transfer 0.5.2
    Uninstalling s3transfer-0.5.2:
      Successfully uninstalled s3transfer-0.5.2
  Attempting uninstall: boto3
    Found existing installa

In [2]:
dna_sequence  = 'CTAATC TAATCT AATCTA ATCTAG TCTAGT CTAGTA TAGTAA AGTAAT GTAATG TAATGC AATGCC ATGCCG TGCCGC GCCGCG CCGCGT CGCGTT GCGTTG CGTTGG GTTGGT TTGGTG TGGTGG GGTGGA GTGGAA TGGAAA GGAAAG GAAAGA AAAGAC AAGACA AGACAT GACATG ACATGA CATGAC ATGACA TGACAT GACATA ACATAC CATACC ATACCT TACCTC ACCTCA CCTCAA CTCAAA TCAAAC CAAACA AAACAG AACAGC ACAGCA CAGCAG AGCAGG GCAGGG CAGGGG AGGGGG GGGGGC GGGGCG GGGCGC GGCGCC GCGCCA CGCCAT GCCATG CCATGC CATGCG ATGCGC TGCGCC GCGCCA CGCCAA GCCAAG CCAAGC CAAGCC AAGCCC AGCCCG GCCCGC CCCGCA CCGCAG CGCAGA GCAGAG CAGAGG AGAGGG GAGGGT AGGGTT GGGTTG GGTTGT GTTGTC TTGTCC TGTCCA GTCCAA TCCAAC CCAACT CAACTC AACTCC ACTCCT CTCCTA TCCTAT CCTATT CTATTC TATTCC ATTCCT'

In [3]:
import boto3
session = boto3.session.Session()
aws_region = session.region_name

In [4]:
from sagemaker.huggingface import HuggingFaceModel
import sagemaker

role = sagemaker.get_execution_role()
# Step 1: Hub Model configuration. https://huggingface.co/models
hub = {
	'HF_MODEL_ID':'AidenH20/DNABERT-500down',
	'HF_TASK':'text-classification'
}

# Step 2: create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
	transformers_version='4.17.0',
	pytorch_version='1.10.2',
	py_version='py38',
	env=hub,
	role=role, 
)

# Step 3: deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1, # number of instances
	instance_type='ml.m5.xlarge' # ec2 instance type
)

-----!

In [5]:
predictor.predict({
	'inputs': dna_sequence 
})

[{'label': 'LABEL_0', 'score': 0.9938730001449585}]

In [6]:
endpoint_name = predictor.endpoint_name
endpoint_name

'huggingface-pytorch-inference-2022-12-19-16-51-05-856'

**Note:** Make sure to delete the endpoint when not in use, as you will incur cost for it. 

In [8]:
# Delete endpoint
predictor.delete_endpoint()