# Training Pipeline

**SageMaker Studio Kernel**: Data Science

In this exercise you will do:
 - Create/Run an Amazon SageMaker Pipeline [SageMaker Pipelines](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines.html)

***

## Part 1/3 - Setup
Here we'll import some libraries and define some variables.

In [None]:
import boto3
import json
import logging
import os
import sagemaker
from sagemaker import get_execution_role
import sys

In [None]:
sys.path.insert(0, os.path.abspath('./../mlpipelines'))

In [None]:
from training.pipeline import get_pipeline

In [None]:
s3_client = boto3.client('s3')
sm_client = boto3.client('sagemaker')

In [None]:
logging.basicConfig(level=logging.INFO)
LOGGER = logging.getLogger(__name__)

***

## Part 2/3 - Create Amazon SageMaker Pipeline

### Explore Pipeline definition

In [None]:
!pygmentize ./../mlpipelines/training/pipeline.py

### Pipeline Parameters

In [None]:
region = boto3.session.Session().region_name
role_name = "mlops-sagemaker-execution-role"
role = "arn:aws:iam::{}:role/{}".format(boto3.client('sts').get_caller_identity().get('Account'), role_name)

kms_account_id = boto3.client('sts').get_caller_identity().get('Account')
kms_alias = "ml-kms"

bucket_name = ""

inference_instance_type = "ml.m5.xlarge"

model_package_group_name = "ml-end-to-end-group"
model_approval_status = "PendingManualApproval"

processing_entrypoint = "./../algorithms/processing/src/processing.py"
processing_framework_version = "0.23-1"
processing_instance_count = 1
processing_instance_type = "ml.t3.large"
processing_input_files_path = "data/input"
processing_output_files_path = "data/output"

training_artifact_path = "artifact/training"
training_artifact_name = "sourcedir.tar.gz"
training_output_files_path = "models"
training_framework_version = "2.4"
training_python_version = "py37"
training_instance_count = 1
training_instance_type = "ml.p2.xlarge"
training_hyperparameters = {
    "epochs": 5,
    "input_file": "processed_data.csv"
}

### Compress source code for installing additional python modules

In [None]:
! ./../algorithms/buildspec.sh training $bucket_name

### Get the dataset and upload it to an S3 bucket

In [None]:
boto_session = boto3.Session(region_name=region)

sagemaker_client = boto_session.client("sagemaker")
runtime_client = boto_session.client("sagemaker-runtime")

sagemaker_session = sagemaker.session.Session(
    boto_session=boto_session,
    sagemaker_client=sagemaker_client,
    sagemaker_runtime_client=runtime_client,
    default_bucket=bucket_name
)

In [None]:
# Download the 
# clean the buckets first
s3_client.delete_object(Bucket=bucket_name, Key=processing_input_files_path)

input_data = sagemaker_session.upload_data('./../data/TheSocialDilemma.csv', key_prefix=processing_input_files_path)

print(input_data)

***

## Training pipeline

### Get pipeline definition

In [None]:
pipeline = get_pipeline(
    region,
    kms_account_id,
    kms_alias,
    bucket_name,
    inference_instance_type,
    model_package_group_name,
    processing_entrypoint,
    processing_framework_version,
    processing_instance_count,
    processing_instance_type,
    processing_input_files_path,
    processing_output_files_path,
    training_artifact_path,
    training_artifact_name,
    training_output_files_path,
    training_framework_version,
    training_python_version,
    training_instance_count,
    training_instance_type,
    training_hyperparameters,
    role=role,
    pipeline_name="MLOpsTrainPipeline"
)

### Create or update SageMaker pipeline

In [None]:
pipeline.upsert(role_arn=role)

In [None]:
json.loads(pipeline.definition())

***

## Part 3/3 - Run SageMaker Pipeline

### Start training pipeline 

In [None]:
execution = pipeline.start()

In [None]:
execution.describe()

In [None]:
execution.list_steps()

### Start training pipeline and overriding parameters

In [None]:
args = {
    "Epochs": "",
    "ModelApprovalStatus": "",
    "ModelPackageGroupName": "",
    "ProcessingInput": ""
}

In [None]:
execution = pipeline.start(
    parameters=args
)