## <center><font size="5" color='black'> ------------- Section 05 - Design Pipeline ------------- </font>

In [2]:
"""
# What : This notebook is specifically for desiging the pipeline. It is supposed to be used various deployment notebook.

# Why : The vision is to keep pipeline design code seggregated from the notebook where it will be triggered. Design pipeline can
        arguably be a developer team job and not a deployment team job. The deployment team will simply trigger the pipeline.
        
# Anything Else : Yes. On a very high level, here are the steps that will be defined in this pipeline.
1. Create Model
2. Register Model
3. 
"""
print('')




### <font size="2" color='red'>Step Create Model</font>

In [3]:
"""
# What : We are retrieving a plain vanilla image of algorithm from sagemaker image_uris repository.

# Why : As we know that our model was trained on linear-learner algorigthm, we will first retrieve a base image and then create
        a model object from our trained model. Optionally, we may want to register this newly created model.

# Anything Else: Nopes. Let's keep going.

"""
import boto3
from sagemaker.image_uris import retrieve

base_image_linear_learner = retrieve("linear-learner", boto3.Session().region_name, version="1")
print(f'base_image_linear_learner : {base_image_linear_learner}')

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml
base_image_linear_learner : 382416733822.dkr.ecr.us-east-1.amazonaws.com/linear-learner:1


In [4]:
import utils as u

%store -r deployment_folder
deployment_folder = deployment_folder['deployment_folder']
deploy_artifacts = u.get_deployment_artifacts(s3_url = deployment_folder)
deploy_artifacts

{'eval_metrics': 's3://01-attempts-406016308324/pipelines/use_case_ageVsSalary/deployerRegion/deployment_20240823123423/eval_metrics.csv',
 'model': 's3://01-attempts-406016308324/pipelines/use_case_ageVsSalary/deployerRegion/deployment_20240823123423/model.tar.gz',
 'preprocessRawData': 's3://01-attempts-406016308324/pipelines/use_case_ageVsSalary/deployerRegion/deployment_20240823123423/preprocessRawData.py'}

In [5]:
"""
# What : In this step, we are creating an object for Model class. This object will be used in a subsequent step
         "Step Create Model" which will be a step in our deployment pipeline.

# Why : For pipeline to deploy model using real-time endpoint or batch tranform job, we will need to create the model
        first from the model that is provided by the development team for deployment.

# Anything Else: Yes and very important. Notice that sagemaker_session parameter below has been assigned pipeline_session
                 object. If you recall (and if you want to recheck in any development notebook), you will find that session
                 parameter is always assigned the current sagemaker_session object. The use of pipeline_session DEFERS 
                 immediate execution of code. This means that even though we are creating model object but no object will be
                 created under SageMaker >> Inference >> Model tab on Sagemaker home screen.
                 
                 For experiment purpose, try executing the same code by using sagemaker_session = sagemaker_session and
                 see that the model object is immediatly created under SageMaker >> Inference >> Model tab on Sagemaker home 
                 screen.
         
"""
from sagemaker.model import Model
from sagemaker.workflow.pipeline_context import PipelineSession
from sagemaker import get_execution_role

role= get_execution_role()
pipeline_session = PipelineSession()

ldn_time = u.get_london_time()

trained_model = Model(
                        image_uri=base_image_linear_learner,
                        name = "AgeVsSalary-model-" + ldn_time,
                        model_data=deploy_artifacts['model'] ,#step_train_model.properties.ModelArtifacts.S3ModelArtifacts,
                        sagemaker_session=pipeline_session,
                        role=role
                    )

In [6]:
# from sagemaker.inputs import CreateModelInput
from sagemaker.workflow.model_step import ModelStep

step_create_model = ModelStep(
                                name="step_create_model",
                                step_args=trained_model.create(instance_type="ml.m5.large", accelerator_type="ml.eia1.medium"),
                                )



### <font size="2" color='red'>Step Create Transformer</font>

In [7]:
try:
    %store -r batch_file_location
    batch_file_location = batch_file_location['batch_file_location']
except Exception as e:
    assert False, 'batch_input_location, batch_output_location must be defined to design pipeline'

In [8]:
# todo: remove hard coding
batch_file_location = r's3://01-attempts-406016308324/pipelines/use_case_ageVsSalary/deployerRegion/deployment_20240823123423/input-data-for-batch.csv'

In [9]:
from sagemaker.transformer import Transformer

transformer = Transformer(
                            model_name=step_create_model.properties.ModelName,
                            instance_type="ml.m5.xlarge",
                            instance_count=1,
                            output_path=batch_file_location
                            )

from sagemaker.inputs import TransformInput
from sagemaker.workflow.steps import TransformStep

step_transform_model = TransformStep(
                                name="step_batch_transform", 
                                transformer=transformer, 
                                inputs=TransformInput(data=batch_file_location,content_type="text/csv",split_type="Line")
                                )

In [10]:
from sagemaker.workflow.pipeline import Pipeline


pipeline_name = f'pipelineAgeVsSalary-' + ldn_time
pipeline_create_transform = Pipeline(
                                    name=pipeline_name,
                                    parameters=[1, "ml.m5.xlarge"],
                                    steps = [step_create_model, step_transform_model]
                                    )

In [11]:
from sagemaker import get_execution_role
role= get_execution_role()

pipeline_create_transform.upsert(role_arn=role)
execution= pipeline_create_transform.start()
print('pipeline executed')



pipeline executed


In [11]:
"""
# What : We are converting this notebook to .py script.

# Why: Because we want to use this code as import in our deployment notebook.

# Anything Else : Notice that we are cannot return anything from notebook. But from our .py script, we want to get the designed
                  pipeline object so that we can use it and trigger it. Hence an additional line
                  
                  return pipeline_create_transform
                  
                  is programatically created AFTER the ipynb is exported as py script using nbconvert command.

"""
!jupyter nbconvert --to script designPipeline.ipynb

[NbConvertApp] Converting notebook designPipeline.ipynb to script
[NbConvertApp] Writing 6604 bytes to designPipeline.py


In [13]:
# Call the function to wrap the code
u.wrap_in_function(input_script='designPipeline.py',
                   output_script='designPipelineFormatted.py',
                   function_name='get_pipeline_create_transform(deployment_folder, batch_file_location)')

## Time to switch back to "deployment" notebook. Remember we actually were performing deployment.

Open [deployment.ipynb](deployment.ipynb)

In [None]:
# assign model quality, data quality, bias drift etc under sagemaker >> governance >> model dashboard