# Training AMPLIFY model on Amazon Sagemaker

### Important: Setting the Jupyter Working Directory

This notebook assumes that your working directory is set to the following path: `<repo-root>/framework-integrations/sagemaker/training`

#### How to Check and Set the Working Directory
Before running the notebook, you can verify that your working directory is correct by running the following command:

In [None]:
import os
print(os.getcwd())

If the output is not `<repo-root>/framework-integrations/sagemaker/training`, you can set the working directory manually by running:

In [None]:
os.chdir('/path/to/your/repository/framework-integrations/sagemaker/training')

### Install/Update the environment

In [None]:
!pip install --upgrade awscli botocore sagemaker -q

### Import the libraries

In [None]:
import time
import boto3
import sagemaker
from sagemaker import get_execution_role, Session, image_uris
from sagemaker.pytorch import PyTorch
from sagemaker.huggingface import HuggingFace

In [None]:

sagemaker_session = Session()
region = boto3.Session().region_name
execution_role = get_execution_role()


### Define Data Location

In [None]:
# Adjust this to your local folder path
s3_data_location = "s3://amplify-models-aws/data/uniref50/uniref50_sample_100.csv"


### Define the instance type 

In [None]:
instance_type = "ml.g5.12xlarge"

### Define the container

In [None]:
image_uri = "763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-training:2.1.0-transformers4.36.0-gpu-py310-cu121-ubuntu20.04"


### Define the estimator

In [None]:
estimator = HuggingFace(
    py_version="3.10",
    entry_point='train.py', 
    source_dir='code',        
    role=execution_role,
    image_uri = image_uri,
    instance_count=1,
    instance_type=instance_type, 
    keep_alive_period_in_seconds=1800,
)

### Start Training

In [None]:
training_job_name = f"AMPLIFY-hf-training-job-{int(time.time())}"


estimator.fit({
    'train': s3_data_location
}, job_name=training_job_name)

### Get the data model data 

In [None]:
estimator = HuggingFace.attach(training_job_name)

In [None]:
estimator.model_data