# Bring Your Own Algoritm


SageMaker Training and Hosting 

* Built-in algorithms
* Pre-built container images
    * Supports popular frameworks like MxNet, TensorFlow
    * FLexibility to use a wide selection of algoritms
* Extend pre-built container images
* Custom container images - use a different language and framework

Last 3 options allow you to bring your own algorithm


## Built-In Algorithms

### Training

Built-in algorithms

* Images stored in a public ECR
* Data needed to train and test the model are stored in an S3 model
* SageMaker spins up the training instances, downloads the container and hyper parameters, downloads the traing and test data.
* Model is copied to s3 then training instances are terminated.


### Hosting

* Supply hosting configuration:
    * Image to use 
    * Model in s3 to host
    * Instance type and number of instances
* SageMaker...
    * Spins up the hosting instances
    * Downloads the model
    * Uses the hosting entry point of the container
    
## Custom Image

* Package your algorithm in a container that meets SageMaker standards/conventions
* Store the image in ECR

From here the training and hosting is almost identical to prebuilt

* Container must also specify service and training entry points
* When training is done must serialize model artifacts to a local directory


## Popular Frameworks

SageMaker provides framework images for popular frameworks - SKLearn, Tensorflow, MxNet, PyTorch, etc.

* Write script to define algoritm you want to use or even define a new algoritm.
* Container then uses the script during the fit step.
* Script uses hyperparameters and local data to train the model, then writes the fitted model to a local folder

You can also use local mode - helps with script writing, integration, etc.

Hosting

* Use same image, provide location to model in s3
* Might need a script file depending on framework

## Folder Structure and Env Variables

* Standard structure for reading data and resources
* Entry point that contains the code to run when the container is started
* Instrumentation - uses StdOut, StdErr - metrics are sent to CloudWatch
* Metric Capture - log metrics and define regex patterns to capture values from log
* One image for training and hosting or different images (when computer resource requirements substantially different)


Folder Structure

* /opt/ml/input - contains hyperparameters and data needed for training 
    * /config
    * /data
        * /channel
* /opt/ml/code - scripts used for training and serving
* /opt/ml/model - stores trained model
* /opt/ml/output - errors that happened during training
    * /failure
    
Details - Training Input

* /opt/ml/input/config/hyperparameters.json - hyper params for training
* /opt/ml/input/config/resourceConfig.json - container network layout for training
* /opt/ml/input/data/channel/ - channel = training, testing. Containers files for each channel, e.g. /opt/ml/input/data/training, /opt/ml/input/data/testing
* /opt/ml/input/data/channel_epoch/ - channel = training, test, eval, etc. Epoch = 0,1,2... creates a named pipe for channel_epic, read the pipe to stream data from s3 for each epoch
* /opt/ml/code - scripts to run from the container


Details - Training Output

* /opt/ml/model/ - script should write generated model to this directory, store model checkpoints and final output, sagemaker uploads the content of model folder to your s3 bucket
* /opt/ml/output/failure - if training fails write the error description to the failure file. Sagemaker returns the first 1024 chars as the failure reason in the job descripton. SageMaker uploads content of output folder to s3

Details - Hosting

* /opt/ml/model - model files to use for inference
* /opt/ml/code - scripts to run from container

Environment Variables

* See https://github.com/aws/sagemaker-containers#how-a-script-is-executed-inside-the-container
* Can get hyperparams as environment variables and as script arguments

How a script is executed - see [here](https://github.com/aws/sagemaker-containers#how-a-script-is-executed-inside-the-container)

## Lab - SKLearn Estimator Bring Your Own

Start with local mode, then switch to cloud training and hosting.

Notes:

* Need ml.m5.xlarge
* Files from [this repo](https://github.com/ChandraLingam/AmazonSageMakerCourse/tree/master/CustomAlgorithm/ScikitLearn/Iris)
    * daemon and setup scripts provided by amazon
    * data prep from the other iris labs
    * skikit_learn_iris.py is the script
    * iris_scikit_learn_training_and_serving.ipynb is the file that controls learning and serving
    
Toubleshooting

* Choose local for instance type
* Set instance type to desired EC2 type when doing cloud training

Misc

* Health check - pinged by runtime

## Lab - TensorFlow Estimator Bring Your Own

Notes:

* [This repo](https://github.com/ChandraLingam/AmazonSageMakerCourse/tree/master/CustomAlgorithm/TensorFlow/Iris)
* [Main script](https://github.com/ChandraLingam/AmazonSageMakerCourse/blob/master/CustomAlgorithm/TensorFlow/Iris/iris_tensorflow_training_and_serving.ipynb) shows changes from iris NN lab
* Loss function is sparse_categorical_crossentropy - does not require one hot encoding of labels (categorical_crossentropy used earlier does...)


