![Author](https://img.shields.io/badge/Author-Soufiane%20AAZIZI-brightgreen)
[![Medium](https://img.shields.io/badge/Medium-Follow%20Me-blue)](https://medium.com/@aazizi.soufiane)
[![GitHub](https://img.shields.io/badge/GitHub-Follow%20Me-lightgrey)](https://github.com/aazizisoufiane)
[![LinkedIn](https://img.shields.io/badge/LinkedIn-Connect%20with%20Me-informational)](https://www.linkedin.com/in/soufiane-aazizi-phd-a502829/)

---
# Streamlining Machine Learning Model Deployment with CI/CD and MLOps

In today's ever-evolving landscape of data science and machine learning, proficiency in managing and deploying machine learning models has transitioned from a desirable skill to an absolute necessity. Aspiring data scientists are no longer solely responsible for building sophisticated models; they are also expected to seamlessly integrate these models into real-world applications. This is where the convergence of Continuous Integration and Continuous Deployment (CI/CD) and Machine Learning Operations (MLOps) takes center stage.

In this Jupyter Notebook, we embark on an exciting journey into the realm of CI/CD and MLOps. Our goal is to demystify these crucial concepts and underscore their significance for emerging data scientists. Through hands-on exploration, we will delve into a project that harnesses the power of Amazon SageMaker and AWS Step Functions to simplify the entire process of data preprocessing, model training, and deployment. This project serves as an invaluable stepping stone for data scientists eager to acquire essential CI/CD and MLOps skills, further enhancing their appeal in today's competitive job market.

Join us on this educational journey as we unlock the potential of CI/CD and MLOps in the context of machine learning, empowering you to take your data science expertise to the next level.
## 


## Table of Contents
- [Installing Required Packages](#Installing)
- [Imports](#imports)
- [Initialization](#initialization)
- [Data Preprocessing](#data-preprocessing)
- [Model Training](#model-training)
- [Model Evaluation](#model-evaluation)
- [Results and Discussion](#results-and-discussion)
- [Conclusion](#conclusion)

## Installing Required Packages <a class="anchor" id="Installin"></a>
Before we begin, let's make sure we have all the necessary Python packages installed. You can run the following command to install any missing packages:


In [None]:
!pip install --upgrade pip

In [None]:
!pip install stepfunctions  omegaconf  nb-black

In [None]:
! pip install -U sagemaker

### Auto-formatting and Auto-Reloading Configuration

In [1]:
%load_ext lab_black
%load_ext autoreload
%autoreload 2

## Imports <a class="anchor" id="imports"></a>
- Group all import statements here.

In [None]:
import os
import time
import boto3
import sagemaker
from sagemaker import get_execution_role
from sagemaker.sklearn.processing import SKLearnProcessor
from sagemaker.huggingface import HuggingFace
from stepfunctions import steps
from stepfunctions.inputs import ExecutionInput
from stepfunctions.workflow import Workflow
from stepfunctions.steps import (
    Chain,
    ChoiceRule,
    ModelStep,
    ProcessingStep,
    TrainingStep,
    TransformStep,
    Parallel,
)
from sagemaker.processing import ProcessingInput, ProcessingOutput
from dotenv import load_dotenv, find_dotenv

load_dotenv(find_dotenv())


## Initialization <a class="anchor" id="initialization"></a>
- Set up your environment, load data, and configure settings.

In [2]:
# Load environment variables from a .env file
_ = load_dotenv(find_dotenv())

# Retrieve the SageMaker workflow execution role from environment variables
workflow_execution_role = os.getenv("SAGEMAKER_WORKFLOW_ROLE")


NameError: name 'load_dotenv' is not defined

In [261]:
execution_input = ExecutionInput(
    schema={
        "JobName": str,
        "PreprocessingJobName": str,
        "TrainingJobName": str,
        "ModelName": str,
        "EndpointName": str,
    }
)

In [262]:
# Define a Fail state for handling SageMaker processing job failures
failed_state_sagemaker_processing_failure = stepfunctions.steps.states.Fail(
    "ML Workflow failed",
    cause="SageMakerProcessingJobFailed"
)

# Define a Catch state for capturing specific errors and transitioning to the Fail state
catch_state_processing = stepfunctions.steps.states.Catch(
    error_equals=["States.TaskFailed"],
    next_step=failed_state_sagemaker_processing_failure
)


In [263]:
failed_state_sagemaker_training_failure = stepfunctions.steps.states.Fail(
    "ML Training failed", cause="SageMakerTrainingJobFailed"
)

catch_state_training = stepfunctions.steps.states.Catch(
    error_equals=["States.TaskFailed"],
    next_step=failed_state_sagemaker_training_failure,
)

In [263]:
failed_state_sagemaker_training_failure = stepfunctions.steps.states.Fail(
    "ML Save Model failed", cause="SageMakerSaveModelJobFailed"
)

catch_state_save_model = stepfunctions.steps.states.Catch(
    error_equals=["States.TaskFailed"],
    next_step=failed_state_sagemaker_training_failure,
)

In [264]:
failed_state_sagemaker_inference_failure = stepfunctions.steps.states.Fail(
    "ML Inference failed", cause="SageMakerInferenceFailed"
)

catch_state_inference = stepfunctions.steps.states.Catch(
    error_equals=["States.TaskFailed"],
    next_step=failed_state_sagemaker_inference_failure,
)

#### Sagemaker Configuration

In [255]:
# Create a SageMaker Boto3 client with custom configuration
sm_boto = boto3.client(
    "sagemaker",
    config=Config(connect_timeout=5, read_timeout=60, retries={"max_attempts": 60}),
)

# Initialize a SageMaker session using the custom Boto3 client
sagemaker_session = sagemaker.Session(sagemaker_client=sm_boto)

# Retrieve the AWS region from the SageMaker session
region = sagemaker_session.boto_session.region_name

# Get the execution role required for SageMaker operations
role = get_execution_role()

# Retrieve configuration values for the S3 bucket and prefix
bucket = config.s3.bucket
prefix = config.s3.prefix
s3_bucket_base_uri = config.s3.s3_bucket_base_uri


{'total_max_attempts': 61, 'mode': 'legacy'}


# Functions

In [None]:
def job_name(jobname):
    """
    Generate a unique job name for an Amazon SageMaker job based on a given 'jobname' and current timestamp.

    Args:
        jobname (str): A descriptive name for the job.

    Returns:
        str: A unique job name incorporating the 'jobname' and timestamp.
    """
    return f"MultiLabelClassification-{jobname}--{time.strftime('%Y%m%d%H%M%S', time.gmtime())}"


def upload_code(bucket_name, prefix_name, script_location):
    """
    Upload code or script to an Amazon S3 bucket for use in SageMaker.

    Args:
        bucket_name (str): The name of the S3 bucket where the code will be uploaded.
        prefix_name (str): The prefix or directory within the S3 bucket where the code will be stored.
        script_location (str): The local path to the code or script file.

    Returns:
        str: The S3 URI of the uploaded code.
    """
    return sagemaker_session.upload_data(
        script_location,
        bucket=bucket_name,
        key_prefix=f"{prefix_name}/{script_location}",
    )

## Data Preprocessing <a class="anchor" id="data-preprocessing"></a>
- Describe how you preprocess and clean the data.

In [265]:
# Define the location of the preprocessing script within the project
PREPROCESSING_SCRIPT_LOCATION = "preprocess/code"

# Create the output path for preprocessing results using the S3 bucket base URI and prefix
output_preprocess = "{}/{}".format(s3_bucket_base_uri, config.s3.prefix)


In [266]:
input_code_preprocess = sagemaker_session.upload_data(
    PREPROCESSING_SCRIPT_LOCATION,
    bucket=bucket,
    key_prefix=f"{prefix}/{PREPROCESSING_SCRIPT_LOCATION}",
)
inputs_preprocess = [
    ProcessingInput(
        source=f"{config.s3.s3_bucket_base_uri}/{config.s3.input}",
        destination="/opt/ml/processing/input",
        input_name="input-data",
    ),
    ProcessingInput(
        source=input_code_preprocess,
        destination="/opt/ml/processing/input/code",
        input_name="code",
    ),
    ProcessingInput(
        source=f"s3://{bucket}/{prefix}/{PREPROCESSING_SCRIPT_LOCATION}/config",
        destination="/opt/ml/processing/input/config",
        input_name="code-config",
    ),
]


outputs_preprocess = [
    ProcessingOutput(
        source="/opt/ml/processing/train",
        destination=output_preprocess,
        output_name="train_data",
    ),
    ProcessingOutput(
        source="/opt/ml/processing/test",
        destination=output_preprocess,
        output_name="test_data",
    ),
    ProcessingOutput(
        source="/opt/ml/processing/labels",
        destination=output_preprocess,
        output_name="labels_data",
    ),
]

In [267]:
# Create a descriptive name for the preprocessing step, including a timestamp.
preprocessing_step_name = f"Multilabel Classification - Preprocessing Step {time.strftime('%Y%m%d%H%M%S', time.gmtime())}"


# Define an Amazon SageMaker SKLearnProcessor with custom settings.
def sklearn_processor(instance_type="ml.m5.xlarge"):
    """
    Create an SKLearnProcessor instance for Amazon SageMaker processing jobs.

    Args:
        instance_type (str): The Amazon SageMaker instance type for processing jobs.

    Returns:
        sagemaker.processing.SKLearnProcessor: An instance of the SKLearnProcessor.
    """
    return SKLearnProcessor(
        framework_version="1.2-1",
        role=role,  # Ensure the 'role' variable is defined and appropriate.
        instance_type=instance_type,
        instance_count=1,
        # max_runtime_in_seconds=1200,  # Uncomment and customize if needed.
    )


# Define the SageMaker processing step for data preprocessing.
processing_step = ProcessingStep(
    preprocessing_step_name,
    processor=sklearn_processor(),
    job_name=execution_input["PreprocessingJobName"],
    inputs=inputs_preprocess,  # Define your input data configuration here.
    outputs=outputs_preprocess,  # Define your output data configuration here.
    container_entrypoint=["python3", "/opt/ml/processing/input/code/run.py"],
    container_arguments=[
        "--train-test-split-ratio",
        "0.2",
    ],  # Uncomment and customize if needed.
)

## Model Training <a class="anchor" id="model-training"></a>
- Explain the machine learning models, algorithms, and training process.

## Model Evaluation <a class="anchor" id="model-evaluation"></a>
- Evaluate model performance and display results.

## Results and Discussion <a class="anchor" id="results-and-discussion"></a>
- Summarize findings and provide insights.

## Conclusion <a class="anchor" id="conclusion"></a>
- Conclude the notebook and suggest future work.

## Functions

## Config

# Training 

In [269]:
output_config = processing_step.fields["parameters"]["ProcessingOutputConfig"]
for output in output_config["Outputs"]:
    if output["OutputName"] == "train_data":
        preprocessed_training_data = os.path.join(
            output["S3Output"]["S3Uri"], "train.csv"
        )
    if output["OutputName"] == "test_data":
        preprocessed_test_data = os.path.join(output["S3Output"]["S3Uri"], "test.csv")
    if output["OutputName"] == "labels_data":
        preprocessed_labels_data = os.path.join(
            output["S3Output"]["S3Uri"], "labels.csv"
        )
preprocessed_test_data

's3://ck-qa-pldatascience/aws_mlOps/sagemaker-pipeline/stepfunctions/test.csv'

In [270]:
test_s3_file = (
    "s3://ck-qa-pldatascience/aws_mlOps/sagemaker-pipeline/stepfunctions/validation"
)

In [271]:
metric_definitions = [
    {"Name": "eval_loss", "Regex": "'eval_loss': ([0-9]+(.|e\-)[0-9]+),?"},
    {"Name": "eval_accuracy", "Regex": "'eval_accuracy': ([0-9]+(.|e\-)[0-9]+),?"},
    {"Name": "eval_f1", "Regex": "'eval_f1': ([0-9]+(.|e\-)[0-9]+),?"},
    {"Name": "eval_roc", "Regex": "'eval_roc': ([0-9]+(.|e\-)[0-9]+),?"},
    {"Name": "eval_roc_auc", "Regex": "'eval_roc_auc': ([0-9]+(.|e\-)[0-9]+),?"},
    {"Name": "eval_precision", "Regex": "'eval_precision': ([0-9]+(.|e\-)[0-9]+),?"},
    {"Name": "eval_recall", "Regex": "'eval_recall': ([0-9]+(.|e\-)[0-9]+),?"},
    {"Name": "eval_runtime", "Regex": "'eval_runtime': ([0-9]+(.|e\-)[0-9]+),?"},
    {
        "Name": "eval_samples_per_second",
        "Regex": "'eval_samples_per_second': ([0-9]+(.|e\-)[0-9]+),?",
    },
    {"Name": "epoch", "Regex": "'epoch': ([0-9]+(.|e\-)[0-9]+),?"},
]

instance_volume = {
    "ml.g4dn.16xlarge": 900,
    "ml.g4dn.8xlarge": 500,
    "ml.g4dn.4xlarge": 225,
    "ml.g4dn.2xlarge": 225,
    "ml.g4dn.xlarge": 125,
}


params = {
    "epochs": 1,
    "train-batch-size": 8,
    "eval_steps": 1,
    "instance_type": "ml.g4dn.4xlarge",
    "volume_size": 125,
}

In [272]:
def generate_estimator():
    checkpoint_s3_uri = f"s3://{config.s3.bucket}/{config.s3.prefix}/checkpoints"
    output_path = f"s3://{config.s3.bucket}/{config.s3.prefix}/training_jobs"
    hyperparameters = {
        "model_name": "distilbert-base-uncased",
        "epochs": params["epochs"],
        "train-batch-size": params["train-batch-size"],
        "output_dir": checkpoint_s3_uri,  # Use the checkpoint S3 URI
        "eval_steps": params["eval_steps"],
    }

    return HuggingFace(
        entry_point="train.py",
        source_dir="train/code",
        output_path=f"{output_path}/",
        code_location=output_path,
        role=role,
        base_job_name=f"multi-label-classification",
        checkpoint_s3_uri=checkpoint_s3_uri,  # Specify the checkpoint input path here
        instance_type=params["instance_type"],
        instance_count=1,
        transformers_version="4.6",
        pytorch_version="1.7",
        py_version="py36",
        hyperparameters=hyperparameters,
        metric_definitions=metric_definitions,
        volume_size=instance_volume[params["instance_type"]],
        sagemaker_session=sagemaker_session,
    )

In [273]:
params["instance_type"]

'ml.g4dn.4xlarge'

In [274]:
def generate_data():
    train_path = preprocessed_training_data
    test_path = preprocessed_test_data
    labels_path = preprocessed_labels_data
    data = {
        "train": sagemaker.TrainingInput(train_path, content_type="text/libsvm"),
        "test": sagemaker.TrainingInput(test_path, content_type="text/libsvm"),
        "labels": sagemaker.TrainingInput(labels_path, content_type="text/libsvm"),
    }

    return data


def generate_training_step(instance="ml.g4dn.2xlarge"):
    jobname = (
        f"multi-label-classification--{time.strftime('%Y%m%d%H%M%S', time.gmtime())}"
    )
    # mode_ænames.append(jobname)

    training_step = steps.TrainingStep(
        f"Trainning -- instance {instance}",
        estimator=generate_estimator(),
        data=generate_data(),
        job_name=jobname,
        wait_for_completion=True,
    )
    return training_step, jobname

In [275]:
# model = generate_estimator()
# model.fit(generate_data())

In [276]:
# training_step.get_expected_model()

In [277]:
training_step, training_job_name = generate_training_step(params["instance_type"])
# generate_estimator().fit(generate_data(), wait=False)
model_step = steps.ModelStep(
    "Save model",
    model=training_step.get_expected_model(),
    model_name=training_job_name,
    instance_type=params["instance_type"],
)

Using provided s3_resource


In [278]:
training_job_name

'multi-label-classification--20230921173534'

In [279]:
# transform_step = steps.TransformStep(
#     "Transform Input Dataset",
#     transformer=generate_estimator().transformer(
#         instance_count=1, instance_type=params["instance_type"]
#     ),
#     job_name=execution_input["JobName"],
#     model_name=execution_input["ModelName"],
#     data=test_s3_file,
#     content_type="text/libsvm",
# )

In [280]:
transform_step

Transform Input Dataset TransformStep(resource='arn:aws:states:::sagemaker:createTransformJob.sync', parameters={'TransformJobName': <stepfunctions.inputs.placeholders.ExecutionInput object at 0x7ff2d906cd90>, 'ModelName': <stepfunctions.inputs.placeholders.ExecutionInput object at 0x7ff2d906ce90>, 'TransformInput': {'DataSource': {'S3DataSource': {'S3DataType': 'S3Prefix', 'S3Uri': 's3://ck-qa-pldatascience/aws_mlOps/sagemaker-pipeline/stepfunctions/validation'}}, 'ContentType': 'text/libsvm'}, 'TransformOutput': {'S3OutputPath': 's3://sagemaker-us-east-1-889859566884/multi-label-classification-2023-09-21-17-08-32-461'}, 'TransformResources': {'InstanceCount': 1, 'InstanceType': 'ml.g4dn.4xlarge'}, 'Environment': {}}, type='Task')

In [281]:
jobname = training_step.fields["parameters"]["HyperParameters"][
    "sagemaker_job_name"
].replace('"', "")
model_uri = (
    f'{training_step.fields["parameters"]["OutputDataConfig"]["S3OutputPath"]}{jobname}'
)
model_uri

's3://ck-qa-pldatascience/aws_mlOps/sagemaker-pipeline/stepfunctions/training_jobs/multi-label-classification--20230921173534'

# Inference

In [282]:
# endpoint_step = stepfunctions.steps.EndpointConfigStep(
#     "Create Endpoint Config",
#     endpoint_config_name=execution_input["ModelName"],
#     model_name=execution_input["ModelName"],
#     initial_instance_count=1,
#     instance_type="ml.m5.large",
# )

In [283]:
endpoint_config_step = steps.EndpointConfigStep(
    "Create Endpoint Config",
    endpoint_config_name=training_job_name,  # execution_input["ModelName"],
    model_name=training_job_name,
    initial_instance_count=1,
    instance_type="ml.m5.large",
)

In [284]:
endpoint_step = steps.EndpointStep(
    "Create Endpoint",
    endpoint_name=training_job_name,  # execution_input["EndpointName"],
    endpoint_config_name=training_job_name,  # execution_input["ModelName"],
)

# Create WorkFlow

In [285]:
from stepfunctions.workflow import Workflow

In [294]:
processing_step.add_catch(catch_state_processing)
training_step.add_catch(catch_state_training)
model_step.add_catch(catch_state_save_model)

workflow_graph = Chain([processing_step, training_step])
workflow_graph = Chain(
    [processing_step, training_step, model_step, endpoint_config_step, endpoint_step]
)


branching_workflow = Workflow(
    name=job_name("MlOpsWorkflow"),
    definition=workflow_graph,
    role=workflow_execution_role,
)
# branching_workflow.render_graph()

In [295]:
# branching_workflow.list_workflows()

In [296]:
branching_workflow.create()

'arn:aws:states:us-east-1:889859566884:stateMachine:MultiLabelClassification-MlOpsWorkflow--20230921174740'

In [297]:
# Execute workflow
execution = branching_workflow.execute(
    inputs={
        "PreprocessingJobName": job_name("PreprocessingJobName"),
        "TrainingJobName": job_name("TrainingJobName"),
        # "ModelName": job_name("ModelName"),
        # "JobName": job_name("JobName"),
        # "EndpointName": job_name("EndpointName"),
    }
)

In [298]:
execution.get_output(wait=False)

In [299]:
import uuid

In [292]:
uuid.uuid1().hex

'46323f3c58a511ee83a21f874bbf4bac'

In [293]:
job_name("JobName")

'MultiLabelClassification-JobName--20230921173536'