# SageMaker BYOC (Bring Your Own Container) Training

This notebook demonstrates training with a custom Docker container on Amazon SageMaker.

## Overview
BYOC allows complete control over the training environment by using custom Docker containers. This is useful when:
- You need specific library versions
- You have custom dependencies
- You want to replicate your local environment exactly
- You need specialized ML frameworks

## Prerequisites
- Docker installed locally
- AWS CLI configured
- ECR repository created
- Custom container built and pushed to ECR

## Steps
1. Install dependencies
2. Configure training data
3. Define hyperparameters
4. Create estimator with custom container
5. Start training job

## Step 1: Install Dependencies

In [None]:
%pip install sagemaker

In [None]:
import sagemaker
print(sagemaker.__version__)

## Step 2: Configure Training Data

Specify S3 locations for training and validation data.

In [None]:
import sagemaker
from sagemaker.inputs import TrainingInput
bucket_name = "Your-Bucket-Name-Here"
datasets_name = "dataset-folder-name"
train_input = TrainingInput(s3_data = "s3://" + bucket_name + '/' + datasets_name + '/' + 'train')
valid_input = TrainingInput(s3_data = "s3://" + bucket_name + '/' + datasets_name + '/' + 'valid')

## Setting hyperparameters

In [None]:
hyperparameters = {"batch_size": 16,
                    "epochs": 1,
                    "learning_rate": 0.001,
                    "model_name": "DenseNet121",
                    "num_classes": 8,
                    }

In [None]:
from sagemaker.pytorch.estimator import PyTorch
from sagemaker import get_execution_role

execution_role = get_execution_role()
# Use image in us-east-1 (copy from us-east-2 or use default SageMaker image)
image_uri = '763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:2.5.1-gpu-py311-cu124-ubuntu22.04-sagemaker'
estimator = PyTorch(
    source_dir = "src", #directory inside the container
    entry_point = "train.py", # training script
    framework_version = "1.12", #PyTorch Framework version, keep it same as used in default example
    py_version = "py38", # Compatible Python version to use
    image_uri = image_uri,
    instance_count = 1, #number of EC2 instances needed for training
    instance_type = "ml.g5.xlarge", #Type of EC2 instance/s needed for training Use 'local" for local mode
    role = execution_role, #Execution role used by training job
    hyperparameters=hyperparameters,
    dependencies=['src/requirements.txt'], #Dependencies needed for training
)

inputs = {"train":train_input, "test": valid_input}
#Start the training
estimator.fit(inputs)