# Step 2 - Training with SagMaker SDK

In this step, we will make some changes to your existing source code to work with the SageMaker training service.  While using SageMaker training service is not mandatory, there are many benefits of using SageMaker training service such as on-demand infrastructure provisioning, training job tracking, model tracking, and model deployment.

First, let's get some working folders and boilerplate files created. 


In [83]:
%%sh

DIRECTORY=MY_PROJECT

target_dir=$DIRECTORY
if [ ! -d "$target_dir" ]; then
    mkdir $target_dir
fi

target_dir=$DIRECTORY/src
if [ ! -d "$target_dir" ]; then
    mkdir $target_dir
fi

target_dir=$DIRECTORY/data
if [ ! -d "$target_dir" ]; then
    mkdir $target_dir
fi

target_dir=$DIRECTORY/data/train
if [ ! -d "$target_dir" ]; then
    mkdir $target_dir
fi

target_dir=$DIRECTORY/data/validation
if [ ! -d "$target_dir" ]; then
    mkdir $target_dir
fi

target_dir=$DIRECTORY/data/test
if [ ! -d "$target_dir" ]; then
    mkdir $target_dir
fi



After running the cell above, you will see some folders got created. These folders are created to help with training using SageMaker.

 - **MY_PROJECT**:  This is the main working folder for this step
 - **MY_PROJECT/src**:  All your source codes from step 1 will be copied here.  
 - **MY_PROJECT/data/train**: This is where the training dataset should be stored.
 - **MY_PROJECT/data/validation**: This is where the validation dataset should be stored
 - **MY_PROJECT/data/test**:  This is where the test dataset should be stored
 
Next, we need to copy some additional files into **MY_PROJECT** folder based on the ML framework you use.

In this Workshop, we currently provide support for either TensorFlow based model training or Scit-Learn based model training.  

SageMaker trains ML model using docker container and it provides default training docker images for many ML framework including TensorFlow, MXNet, Chainer, Pytorch, and Scit-learn.  

In this workshop, we will use SageMaker TensorFlow container for model training.  And for Scit-learn, we will build your own training container to demonstrate the flexibility of SageMaker in supporting different model training requirements

In [84]:
%%sh
# Uncomment line below if your framework is TensorFlow/Keras
#ML_FRAMEWORK=TF  

# Uncomment line below if your ML framework is scit-learn
ML_FRAMEWORK=SK

SOURCE_DIR=MY_PROJECT

DIRECTORY=MY_PROJECT

if [ $ML_FRAMEWORK == 'TF' ]; then
    cp training_inputs.py ./$DIRECTORY/training_inputs.py
    cp job_launcher_tf.ipynb ./$DIRECTORY/job_launcher.ipynb
    cp -a ../step-1/$SOURCE_DIR/* ./$DIRECTORY/src
fi
    
if [ $ML_FRAMEWORK == 'SK' ]; then
    cp job_launcher_sk.ipynb ./$DIRECTORY/job_launcher.ipynb
    cp -r ./container ./$DIRECTORY/
    cp -a ../step-1/$SOURCE_DIR/* ./$DIRECTORY/src
fi
    


If you want to remove the **MY_PROJECT** folder for any reason, you can run the following command to do that.  

Please note, all content in the **MY_PROJECT** folder will be deleted.

In [85]:
%%sh

# Remove # sign in the line below to execute the command
#rm -r MY_PROJECT   

## Instruction For Scit-Learn based script

For Scit-Learn based algorithm, we will build a docker container for training.  Please note that SageMaker also provides a SKLearn Estimator which can be used for training and hosting scit-learn model.

In the **MY_PROJECT** folder, you should see a **container** folder and following folders and files inside it:

 - **code_base**: This is where you will have your training script and all its dependency files. The content in this folder will be copied to the docker container image during container build step.  
 
 - **Dockerfile**: This is the configuration file for building a docker image. You should be able to use this file as is.
 
 - **build_and_push.sh**: This shell script is for building the docker image and push to your ECR repo. You will only need to change the image name to reflect the name you want to use
 
 - **local_test.ipynb**:  This notebook contains instruction on how to perform local testing of the docker image.
 
 - **opt**: This is folder structure created to mirror the folder structure when the container is pushed to SageMaker backend for training. We will use this folder structure to help with local testing. You can copy training data, validation data, testing data to the matching folders, and modify hyperparameter.json file in the config folder.  Your training script will be using files in these folders for model training and saving models
 

**Follow the steps below to make the changes and build the container**

1. Separate the code for data exploration and data processing from the model training code.

2. Split the dataset into separate training and validation files. If you plan to split the dataset in your training code, then you can keep a single file. 

3. Move the training data, validation data to the **train** and **validation** folders under the **data** directory respectively.

4. Port over your model training code to the **train** file in the **code_base** directory. The **train** file will be invoked by the container when the training job is kicked off.  

    You will see the following code blocks in the **train** file for standard directories that would exist in a container.  

    
>prefix = '/opt/ml/'  
>input_path = prefix + 'input/data'   
>output_path = os.path.join(prefix, 'output')  
>model_path = os.path.join(prefix, 'model')   
>param_path = os.path.join(prefix, 'input/config/hyperparameters.json')  

>channel_name='train'   
>training_path = os.path.join(input_path, channel_name)   
>channel_name='validation'   
>validation_path = os.path.join(input_path, channel_name)   


   - **training_path** is where the training dataset will be found, 
   - **validation_path** is where the validation dataset will be found. 
   - **param_path** points to the location of **hyperparameters.json** file, 
   - **model_path** is where you want to save your model artifacts.  

Modify **train** file to add the library import, training loop, model evaluation, and model saving code.  You will find various placeholders (e.g. **#BEGIN - ADD YOUR LIBRARY IMPORT BELOW**) in the file to guide you on where to add your custom code.

5. Test the script before we build the container
    - copy your training data set file and validation data set files to following directories respectively
       - MY_PROJECT/container/opt/ml/input/data/train
       - MY_PROJECT/container/opt/ml/input/data/test
    - Uncomment the line below in the **train** file to use the local directory.  Make sure the comment out this line again before next step.
    >#prefix = '../opt/ml/' #for local testing without using container
    
    - Remove any python packages not required for model training.  This will help reduce the size of docker container when we build it in the next step
       
    - Inside a terminal, run the train file by typing **python train** in the directory where **train** file resides.  The working directory should be
    
     `/home/ec2-user/SageMaker/SageMaker-Migration-Workshop/step-2/MY_PROJECT/container/code_base`
    
    
6. Now you are ready to build the container. Open the **build_and_push.sh** file and change the name of the image.  Run **build_and_push.sh** in a terminal by typing **sh build_and_push.sh** to build a container and push to ECR.  Make sure you are in the right working directory below

     `/home/ec2-user/SageMaker/SageMaker-Migration-Workshop/step-2/MY_PROJECT/container` 

     You can list the docker images after it is built by typing the command below

     `docker images`


7. Open **local_test.ipynb** to perform local testing and ensure the train script and other dependencies are working correctly.  This is optional, but highly recommended, step. If you want to skip this and go to the next step for SageMaker training. Follow the steps below to perform the testing.  If any dependency packages are missing.  Modify **Dockerfile** to add additional packages and repeat step 6 above. If any packages are not needed for the model training, consider removing them to reduce the size of the docker image


8. Open **job_launcher.ipynb**, and follow the instructions to launch a training job in SageMaker using the custom docker image we just built and tested.



## Instruction for TensorFlow based script

For TensorFlow based algorithm, we will use the SageMaker TensorFlow Estimator for training

In the **MY_PROJECT** folder, you will see the following additional files

   - **job_launcher.ipynb** this notebook launches the training job using SageMaker TensorFlow Estimator
   - **training_inputs.py** this module has a function for retrieving a range of parameters values to be used by the training code
   
Follow the following steps to make changes and train the model

### 1. Prepare training script

First you will need to modify your training script. You will need to have your main training script as a `.py` file. If you have a `.ipynb` file, export it to an executable script using the **Export Notebook as..** feature under the **File** menu.

Upload the `.py` file back into the src directory in the working folder. You will need this `.py` file for SageMaker training later.

Now open your main training `.py` script and add the lines in the cell below to your the **main** function in your script or at he beginning of the scrpt. During training, the training code will be provided with information on model directory, training data directory, and validation data directory in the form of command line parameters. The code snippet below will help retrieve those parameters. You can provide your code for retrieving these parameters.

```ruby
from training_inputs import parse_args

args, unknown = parse_args()

SM_MODEL_DIR = args.model_dir
SM_CHANNEL_TRAIN = args.train
SM_CHANNEL_VALIDATION = args.validation
```

### 2. Modify job_launcher.ipynb

Open the **job_launcher.ipynb** file, and follow its instruction to continue.
