# Deployment Lab

This notebook takes you through the steps required to deploy the CNN model on Watson Machine Learning Accelerator (WMLA) using the Elastic Distributed Inference (EDI) service. We leverage the `dlim` CLI tool to interact with WMLA EDI from Watson Studio (WS), which allows you to submit a deployment and perform additional helpful operations, such as starting or stopping the deployment and updating its configurations.

**Skills covered in this lab**
- Setting up the `dlim` CLI tool from WS
- Preparing a deployment submission folder with all file requirements
- Deploying a model to WMLA EDI
- Updating deployment configurations
- Querying, starting, stopping, and undeploying a model deployment

## 0. Requirements

The following are required before we can deploy a model

1. dlim CLI tool
    - `rest-server` &rarr; `https://wmla-console-cpd-wmla.apps.cpd.mskcc.org/dlim/v1/`
    - `jwt-token` &rarr; access token, either generated or obtained from WS environment variable `USER_ACCESS_TOKEN`
    
2. deployment submission directory containing
    - `kernel.py` &rarr; Deployment driver used by WMLA EDI that defines what happens when a deployment kernel starts/inference request is received
    - `model.json` &rarr; Specifies configurations for your deployment
    - `README.md` &rarr; Describes your deployment, including expected payload and response structure/behavior

## 1. Setup

In [2]:
import os
import json

import requests
import urllib3

urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

### 1.1 Set up dlim

* Make sure dlim tool is locally available
* Add to `PATH` variable for easier execution

In [4]:
dlim_path = '/userfs' #location of dlim in local dir
if os.path.exists(f'{dlim_path}/dlim'):
    print('dlim program found...adding to PATH variable')
    
    if not dlim_path in os.environ['PATH']:
        os.environ['PATH'] = os.environ['PATH'] + f':{dlim_path}'
        print(f'Added {dlim_path} to PATH variable')
else:
    print(f'dlim not found in {dlim_path}...did you type the path correctly?')

dlim program found...adding to PATH variable
Added /userfs to PATH variable


* dlim requires `rest_server` and `jwt_token`
    - `rest_server` takes the form https://\<wmla host\>/dlim/v1/
    - `USER_ACCESS_TOKEN` is available as environment variable within Watson Studio and can be supplied to `jwt-token`

In [5]:
# Set as environment variable for easier usage with linux commands
WMLA_HOST = 'https://wmla-console-cpd-wmla.apps.cpd.mskcc.org'
os.environ['REST_SERVER'] = f'{WMLA_HOST}/dlim/v1/'

In [6]:
# Test dlim
!dlim model list --rest-server $REST_SERVER --jwt-token $USER_ACCESS_TOKEN

/usr/bin/sh: /userfs/dlim: Permission denied


### 1.2 Prepare WMLA submission requirements

* Required files must be submitted as part of a submission folder
* This folder **must** contain the kernel, model.json, and README.md file or submission will fail

### 1.2.1 Create submission folder

In [1]:
os.environ['DIR_DEPLOY_SUBMISSION'] = '/userfs/deploy_submissions/enablement-cnn'
os.makedirs(os.environ['DIR_DEPLOY_SUBMISSION'], exist_ok=True)

NameError: name 'os' is not defined

### 1.2.2 Create kernel file

In [6]:
# Contents of kernel file

file_content = '''#!/usr/bin/env python

import traceback
import time

from datetime import datetime,timezone
import json

import redhareapiversion
from redhareapi import Kernel

class MatchKernel(Kernel):
    def on_kernel_start(self, kernel_context):
        pass
        
    def on_task_invoke(self, task_context):
        try:
            Kernel.log_debug("on_task_invoke")
            while task_context != None:
                Kernel.log_debug(f"Task ID: {task_context.get_id()}")
                # Parse payload data
                Kernel.log_debug(f"Unparsing payload")
                input_data = json.loads(task_context.get_input_data())
                
                # Prepare response
                Kernel.log_debug(f"Preparing response")
                task_context.set_output_data(json.dumps(input_data))
                task_context = task_context.next()
                            
        except Exception as e:
            traceback.print_exc()
            Kernel.log_error(f"Failed due to {str(e)}")
    
    def on_kernel_shutdown(self):
        pass

        
if __name__ == '__main__':
    obj_kernel = MatchKernel()
    obj_kernel.run()
'''

# Write to submission directory
with open(os.environ['DIR_DEPLOY_SUBMISSION']+'/kernel.py', 'w') as f:
    f.write(file_content)

### 1.2.3 Create README.md file

In [7]:
file_content = '''# Description
Takes input and returns in response.

## Payload
    - `ping`: a str or int

## Response
    - `pong`: same as `ping`
'''

# Write to submission directory
with open(os.environ['DIR_DEPLOY_SUBMISSION']+'/README.md', 'w') as f:
    f.write(file_content)

### 1.2.4 Create model.json file
* Set deployment and file names

In [8]:
# Deployment name in WMLA
DEPLOY_NAME = 'ping-pong-test'
os.environ['DEPLOY_NAME'] = DEPLOY_NAME
KERNEL_FILENAME = 'kernel.py'
README_FILENAME = 'README.md'

In [9]:
file_content = '''{"name": "__PLACEHOLDER__", 
"kernel_path": "__PLACEHOLDER__", 
 "readme": "__PLACEHOLDER__",
 "tag": "test", 
 "weight_path": "./",  
 "runtime": "dlipy3", 
 "framework": "PyTorch", 
 "schema_version": "1"}
'''

# Write to submission directory
with open(os.environ['DIR_DEPLOY_SUBMISSION']+'/model.json', 'w') as f:
    f.write(file_content)

In [10]:
# Fill in the information
conf = json.load(open(f'{os.environ["DIR_DEPLOY_SUBMISSION"]}/model.json'))
conf['name'] = DEPLOY_NAME
conf['kernel_path'] = KERNEL_FILENAME
conf['readme'] = README_FILENAME

with open(f'{os.environ["DIR_DEPLOY_SUBMISSION"]}/model.json', 'w') as f:
    json.dump(conf, f)
    
conf = json.load(open(f'{os.environ["DIR_DEPLOY_SUBMISSION"]}/model.json'))
conf

{'name': 'ping-pong-test',
 'kernel_path': 'kernel.py',
 'readme': 'README.md',
 'tag': 'test',
 'weight_path': './',
 'runtime': 'dlipy3',
 'framework': 'PyTorch',
 'schema_version': '1'}

## 2. Submit deployment
* If a deployment with the same name already exists, be sure to first stop and undeploy it

In [11]:
!dlim model deploy -p $DIR_DEPLOY_SUBMISSION --rest-server $REST_SERVER --jwt-token $USER_ACCESS_TOKEN

Uploading...
</userfs/deploy_submissions/ping-pong-test/.ipynb_checkpoints/README-checkpoint.md> uploaded to server.
</userfs/deploy_submissions/ping-pong-test/.ipynb_checkpoints/kernel-checkpoint.py> uploaded to server.
</userfs/deploy_submissions/ping-pong-test/.ipynb_checkpoints/model-checkpoint.json> uploaded to server.
</userfs/deploy_submissions/ping-pong-test/README.md> uploaded to server.
</userfs/deploy_submissions/ping-pong-test/kernel.py> uploaded to server.
</userfs/deploy_submissions/ping-pong-test/model.json> uploaded to server.
</userfs/deploy_submissions/ping-pong-test/update_model.json> uploaded to server.
Registering...
Model <ping-pong-test> is deployed successfully


## 3. Modify configuration
* You must first stop a deployment before updating its configuration profile
* The `-f` argument forces the command and avoids user confirmation

In [12]:
!dlim model stop $DEPLOY_NAME --rest-server $REST_SERVER --jwt-token $USER_ACCESS_TOKEN -f

Stopping model "ping-pong-test", run "dlim model view ping-pong-test -s" to ensure stop.


* The `viewprofile` dlim command with the `-j` argument returns the current profile as a JSON
* We modify this JSON with advanced configurations

In [13]:
!dlim model viewprofile $DEPLOY_NAME -j --rest-server $REST_SERVER --jwt-token $USER_ACCESS_TOKEN > $DIR_DEPLOY_SUBMISSION/update_model.json

In [14]:
with open(f"{os.environ['DIR_DEPLOY_SUBMISSION']}/update_model.json",'r') as f:
    update_model = json.load(f)
    
# Enable GPUs
update_model['kernel']['gpu'] = 'exclusive'

# Save updated JSON
with open(f"{os.environ['DIR_DEPLOY_SUBMISSION']}/update_model.json",'w') as f:
    json.dump(update_model, f)

* Use the `updateprofile` command to submit the new JSON

In [15]:
!dlim model updateprofile $DEPLOY_NAME -f $DIR_DEPLOY_SUBMISSION/update_model.json --rest-server $REST_SERVER --jwt-token $USER_ACCESS_TOKEN

Model is updated successfully


## 4. Start the deployment

In [16]:
!dlim model start $DEPLOY_NAME --rest-server $REST_SERVER --jwt-token $USER_ACCESS_TOKEN

Starting model "ping-pong-test", run "dlim model view ping-pong-test -s" to ensure startup.


* Confirm model is deployed and in `Started` state

In [17]:
!dlim model view $DEPLOY_NAME -s --rest-server $REST_SERVER --jwt-token $USER_ACCESS_TOKEN

Name:             ping-pong-test
State:            Started
Serving replica:  1
Serving service ID:   2df9c9bb-5a1c-48a2-b9de-37dcc63ef846
Service JobID:        edi-ping-pong-test-6546dff6f8-gvnj5
GPU Mode:             exclusive
Served clients:       0
Pending requests:     0
Requests per second:  0.00
Data per second:      0.00
Kernel started:       1


## 5. Test deployment

In [18]:
DEPLOYMENT_URL = f'https://wmla-inference-cpd-wmla.apps.cpd.mskcc.org/dlim/v1/inference/{DEPLOY_NAME}'
headers = {'Authorization': f'Bearer {os.getenv("USER_ACCESS_TOKEN")}'}
data = {'data':'123'}

In [19]:
r = requests.post(DEPLOYMENT_URL, headers=headers,
                  json = data, verify = False)

if r.status_code == 200:
    print(r.text)
else:
    print('Error with request')

{"data": "123"}


## 6. Undeploy the model

* To undeploy the model, first make sure it is stopped

In [20]:
!dlim model stop $DEPLOY_NAME --rest-server $REST_SERVER --jwt-token $USER_ACCESS_TOKEN -f

Stopping model "ping-pong-test", run "dlim model view ping-pong-test -s" to ensure stop.


In [21]:
!dlim model undeploy $DEPLOY_NAME --rest-server $REST_SERVER --jwt-token $USER_ACCESS_TOKEN -f

The model cannot be removed if it's started.
