<img src="img/hpe_logo.png" alt="HPE Logo" width="125">

# HPE ML Platform Workshop - Data Preparation

<img src="./img/platform_step01_data.png"></img>

## Install pachctl (MLDM client)

In [None]:
import os

In [None]:
p_ver = "2.8.2"
cm = f"curl -L https://github.com/pachyderm/pachyderm/releases/download/v{p_ver}/pachctl_{p_ver}_linux_amd64.tar.gz | tar -xzv --strip-components=1 -C /usr/local/bin"

In [None]:
os.system(cm)

## Connect to the cluster

In [None]:
cluster_address = "grpc://34.68.160.253:80"

!pachctl connect $cluster_address
#!pachctl auth login

&nbsp;

## Create the MLDM project

In [None]:
project_name = "user01-brain-mri-ws" # change the project name to your user

In [None]:
!pachctl create project $project_name

In [None]:
!pachctl list projects # default should be the active one

## Set the context to your project

In [None]:
!pachctl config update context --project $project_name

In [None]:
!pachctl list projects # your project should be the active one

In [None]:
!pachctl list repos # empty

## Create the first repository

<img src="./img/01_mldm/01.png" alt="MLDM - Important Concepts" width=800px></img>

In [None]:
!pachctl create repo brain-mri-data

In [None]:
!pachctl list repos # brain-mri-data repo should be listed here

## Upload files

In [None]:
!pachctl put file brain-mri-data@master:/data1 -f ../images/brain -r

In [None]:
!pachctl list files brain-mri-data@master

<br/>
<div style="font-size:20px;color:maroon;font-family: 'Courier New';font-weight:bold">
    Alt+Tab: Check the Repo in the MLDM UI
</div>

# Data Pre-Processing

<img src="./img/01_mldm/02.png" alt="Pipelines" width=800px;></img>

## Creating the Validation pipeline

In [None]:
!cat ../pipelines/1_validate.yaml

#### Get a list of datums for this pipeline (based on the glob pattern used)

In [None]:
!pachctl list datums -f ../pipelines/1_validate.yaml

#### Create the pipeline

In [None]:
!pachctl create pipeline -f ../pipelines/1_validate.yaml

In [None]:
!pachctl list pipeline

<br/>
<div style="font-size:20px;color:maroon;font-family: 'Courier New';font-weight:bold">
    Alt+Tab: Check Pipeline in the MLDM UI
</div>

## Datums and glob patterns

<img src="https://mldm.pachyderm.com/images/distributed-computing101.gif" alt="Datums" width=800px></img>

<img src="./img/01_mldm/03.png" alt="Glob Patterns" width=800px></img>

In [None]:
# how many datums does the validation pipeline see?
!pachctl list datum -f ../pipelines/1_validate.yaml

In [None]:
# Get 'first-level' files and folders
!pachctl glob file brain-mri-data@master:/*

In [None]:
# Get only 'second-level' files and folders
!pachctl glob file brain-mri-data@master:/*/*

In [None]:
# Get only TIFF files --> Need to escape the brackets only in the notebook environment
!pachctl glob file brain-mri-data@master:/**.\{tif,tiff\}

In [None]:
# Get only non TIFF files
!pachctl glob file brain-mri-data@master:/**[!.tif]

## Creating the Conversion pipeline

In [None]:
!cat ../pipelines/2_convert.yaml

#### Get a list of datums for this pipeline (based on the glob pattern used)

In [None]:
!pachctl list datums -f ../pipelines/2_convert.yaml

#### Create the pipeline

In [None]:
!pachctl create pipeline -f ../pipelines/2_convert.yaml

In [None]:
!pachctl list pipeline

<br/>
<div style="font-size:20px;color:maroon;font-family: 'Courier New';font-weight:bold">
    Alt+Tab: Check Pipeline in the MLDM UI
</div>

## Inspecting jobs, logs and pipeline spec

In [None]:
!pachctl list pipeline --spec --output yaml # gives the entire list of pipelines, corresponds to the latest commit by default

In [None]:
!pachctl list job --pipeline convert_images

In [None]:
!pachctl list job --pipeline convert_images --raw --output json | jq -r ."job.id"

In [None]:
job_id = !pachctl list job --pipeline convert_images --raw --output json | jq -r ."job.id"
job_id = job_id[0]

In [None]:
!pachctl inspect job convert_images@{job_id}  # add a job id to inspect the job

In [None]:
!pachctl logs --pipeline convert_images

In [None]:
!pachctl logs --job convert_images@{job_id}

## Creating the Resizing pipeline

In [None]:
!cat ../pipelines/3_resize.yaml

#### Get a list of datums for this pipeline (based on the glob pattern used)

In [None]:
!pachctl list datums -f ../pipelines/3_resize.yaml

#### Create the pipeline

In [None]:
!pachctl create pipeline -f ../pipelines/3_resize.yaml

In [None]:
!pachctl list pipeline

<br/>
<div style="font-size:20px;color:maroon;font-family: 'Courier New';font-weight:bold">
    Alt+Tab: Check Pipeline in the MLDM UI
</div>

## Creating the Report pipeline

<img src="./img/01_mldm/04.png" alt="Types of Pipeline Input" width=800px></img>

In [None]:
!cat ../pipelines/4_report.yaml

#### Get a list of datums for this pipeline (based on the glob pattern used)

In [None]:
!pachctl list datums -f ../pipelines/4_report.yaml

#### Create the pipeline

In [None]:
!pachctl create pipeline -f ../pipelines/4_report.yaml

In [None]:
!pachctl list pipeline

<br/>
<div style="font-size:20px;color:maroon;font-family: 'Courier New';font-weight:bold">
    Alt+Tab: Check Pipeline in the MLDM UI
</div>

## Create the Final Report pipeline

In [None]:
!cat ../pipelines/5_final_report.yaml

#### Get a list of datums for this pipeline (based on the glob pattern used)

In [None]:
!pachctl list datums -f ../pipelines/5_final_report.yaml

#### Create the pipeline

In [None]:
!pachctl create pipeline -f ../pipelines/5_final_report.yaml

In [None]:
!pachctl list pipeline

<br/>
<div style="font-size:20px;color:maroon;font-family: 'Courier New';font-weight:bold">
    Alt+Tab: Check Pipeline in the MLDM UI
</div>

## Create the Serve Report pipeline

In [None]:
!cat ../pipelines/6_serve.yaml

#### Get a list of datums for this pipeline (based on the glob pattern used)

In [None]:
!pachctl list datums -f ../pipelines/6_serve.yaml

#### Create the pipeline

In [None]:
!pachctl create pipeline -f ../pipelines/6_serve.yaml

In [None]:
!pachctl list pipeline

In [None]:
!pachctl inspect pipeline serve_report --raw -o json

In [None]:
!pachctl inspect pipeline serve_report --raw -o json | jq -r ."details.service.ip"

In [None]:
report_ip = !pachctl inspect pipeline serve_report --raw -o json | jq -r ."details.service.ip"
report_ip = report_ip[0]

In [None]:
print(f"http://{report_ip}:8080/full_report.html")

<br/>
<div style="font-size:20px;color:maroon;font-family: 'Courier New';font-weight:bold">
    Alt+Tab: Grab the service IP and go to http://service_ip:8080/full_report.html to see the report<br/>
    Delete the pipeline when you're done to release the CPU.
</div>

### Delete the Pipeline

In [None]:
!pachctl delete pipeline serve_report

&nbsp;

# Model Training

### Create the Model Training pipeline

In [None]:
!cat ../pipelines/7_train_model.yaml

<br/>
<div style="font-size:20px;color:maroon;font-family: 'Courier New';font-weight:bold">
    Important: Edit the <i>7_train_model.yaml</i> file and review the values for:
    <ul><li>workspace (The MLDE workspace for this workshop)</li>
        <li>mlde_project (The MLDE project for your user)</li>
        <li>project (MLDM project you created for this lab)</li>
    </ul>
</div>

#### Get a list of datums for this pipeline (based on the glob pattern used)

In [None]:
!pachctl list datums -f ../pipelines/7_train_model.yaml

#### Create the pipeline

In [None]:
!pachctl create pipeline -f ../pipelines/7_train_model.yaml

In [None]:
!pachctl list pipeline

<br/>
<div style="font-size:20px;color:maroon;font-family: 'Courier New';font-weight:bold">
    Alt+Tab:<br/>Check Pipeline in the MLDM UI<br/>Check Experiment in MLDE UI<br/>
</div>

# Model Deployment

### Create the Model Deployment pipeline

In [None]:
!cat ../pipelines/8_deploy_model.yaml

<br/>
<div style="font-size:20px;color:maroon;font-family: 'Courier New';font-weight:bold">
    Important: Edit the <i>8_deploy_model.yaml</i> file and review the values for:
    <ul><li>deployment-name (must be unique)</li> </ul>
</div>

#### Get a list of datums for this pipeline (based on the glob pattern used)

In [None]:
!pachctl list datums -f ../pipelines/8_deploy_model.yaml

#### Create the pipeline

In [None]:
!pachctl create pipeline -f ../pipelines/8_deploy_model.yaml

In [None]:
!pachctl list pipeline

<br/>
<div style="font-size:20px;color:maroon;font-family: 'Courier New';font-weight:bold">
    Alt+Tab:<br/>Check Pipeline in the MLDM UI<br/>
    How to test the deployed model? Stay tuned for the next lab!!<br/>
</div>

 &nbsp;

# Congratulations! The Data Preparation lab is completed!