# CPDCTL Samples for Notebooks and Environments in Spaces

CPDCTL is a command-line interface (CLI) you can use to manage the lifecycle of notebooks. By using the notebook CLI, you can automate the flow for creating notebooks and running notebook jobs, as well as promoting notebooks from a project to a space.   

This notebook begins by showing you how to install and configure CPDCTL and is then split up into three sections with examples of how to use the commands for:

- Creating notebooks and running notebook jobs
- Creating code packages and running code package jobs
- Promoting notebooks from a project to a space

## Table of Contents

[1. Installing and configuring CPDCTL](#part1)
- [1.1 Installing the latest version of CPDCTL](#part1.1)
- [1.2 Adding CPD cluster configuration settings](#part1.2)

[2. Demo 1: Creating a notebook asset and running a job](#part2)
- [2.1 Creating a notebook asset](#part2.1)
- [2.2 Running a job](#part2.2)

[3. Demo 2: Creating a code package asset and running a job](#part3)
- [3.1 Creating a code package asset](#part3.1)
- [3.2 Running a job](#part3.2)

[4. Demo 3: Promoting a notebook from a project to a space](#part4)

## Before you begin
Import the following libraries:

In [2]:
import base64
import json
import os
import requests
import platform
import tarfile
import zipfile
from IPython.core.display import display, HTML

##  1. Installing and configuring CPDCTL <a class="anchor" id="part1"></a>

### 1.1 Installing the latest version of CPDCTL <a class="anchor" id="part1.1"></a>

To use the notebook and environment CLI commands, you need to install CPDCTL. Download the binary from the [CPDCTL GitHub respository](https://github.com/IBM/cpdctl/releases).

Download the binary and then display the version number:

In [3]:
PLATFORM = platform.system().lower()
CPDCTL_ARCH = "{}_amd64".format(PLATFORM)
CPDCTL_RELEASES_URL="https://api.github.com/repos/IBM/cpdctl/releases"
CWD = os.getcwd()
PATH = os.environ['PATH']
CPD_CONFIG = os.path.join(CWD, '.cpdctl.config.yml')

response = requests.get(CPDCTL_RELEASES_URL)
assets = response.json()[0]['assets']
platform_asset = next(a for a in assets if CPDCTL_ARCH in a['name'])
cpdctl_url = platform_asset['url']
cpdctl_file_name = platform_asset['name']
        
response = requests.get(cpdctl_url, headers={'Accept': 'application/octet-stream'})
with open(cpdctl_file_name, 'wb') as f:
    f.write(response.content)
    
display(HTML('<code>cpdctl</code> binary downloaded from: <a href="{}">{}</a>'.format(platform_asset['browser_download_url'], platform_asset['name'])))

In [4]:
%%capture

%env PATH={CWD}:{PATH}
%env CPD_CONFIG={CPD_CONFIG}

In [5]:
if cpdctl_file_name.endswith('tar.gz'):
    with tarfile.open(cpdctl_file_name, "r:gz") as tar:
        tar.extractall()
elif cpdctl_file_name.endswith('zip'):
    with zipfile.ZipFile(cpdctl_file_name, 'r') as zf:
        zf.extractall()

if CPD_CONFIG and os.path.exists(CPD_CONFIG):
    os.remove(CPD_CONFIG)
    
version_r = ! cpdctl version
CPDCTL_VERSION = version_r.s

print("cpdctl version: {}".format(CPDCTL_VERSION))

cpdctl version: 1.1.98


### 1.2  Adding CPD cluster configuration settings <a class="anchor" id="part1.2"></a>

Before you can use CPDCTL, you need to add configuration settings. You only need to configure these settings once for the same IBM Cloud Pak for Data (CPD) user and cluster. Begin by entering your CPD credentials and the URL to the CPD cluster:

In [6]:
CPD_USER_NAME = 'ritchie'
CPD_USER_PASSWORD = 'Enigma'
CPD_URL = 'https://cpd-cp4d.apps.10.99.103.31.nip.io/zen/'

Add "cpd_user" user to the cpdctl configuration:

In [7]:
! cpdctl config user set cpd_user --username {CPD_USER_NAME} --password {CPD_USER_PASSWORD}

Add "cpd" cluster to the cpdctl configuration:

In [8]:
! cpdctl config profile set cpd --url {CPD_URL}

Failed to check CP4D instance version. Verify profile URL "https://cpd-cp4d.apps.10.99.103.31.nip.io/zen/".
Get "https://cpd-cp4d.apps.10.99.103.31.nip.io/zen/diag": dial tcp: lookup cpd-cp4d.apps.10.99.103.31.nip.io on 172.30.0.10:53: no such host


Add "cpd" context to the cpdctl configuration:

In [9]:
! cpdctl config context set cpd --profile cpd --user cpd_user

List available contexts:

In [10]:
! cpdctl config context list

[1mName[0m                          [1mProfile[0m                       [1mUser[0m                       [1mCurrent[0m   
[36;1minClusterEnvironmentContext[0m   inClusterEnvironmentProfile   inClusterEnvironmentUser   *   


Switch to the context you just created if it is not marked in the `Current` column:

In [14]:
! cpdctl config context use inClusterEnvironmentContext

Switched to context "inClusterEnvironmentContext".


List available spaces in context:

In [15]:
! cpdctl space list

...
[1mID[0m                                     [1mName[0m              [1mCreated[0m                    [1mDescription[0m   [1mState[0m    [1mTags[0m   
[36;1m68d65b57-9da2-4e24-93be-87f2eb2f2140[0m   Credit Default    2021-10-28T22:04:01.776Z                 active   []   
[36;1m8a39af86-076d-47f3-a9a5-0927361cd430[0m   Baufinanzierung   2021-10-29T12:45:35.851Z                 active   []   
[36;1m30093262-7220-496f-997a-096be73471cf[0m   Churn_Analyse     2021-11-22T12:14:12.660Z                 active   []   
[36;1mf2124828-0125-45e8-8dda-e8b8ad142fb6[0m   Test              2022-02-17T20:51:38.662Z                 active   []   


Choose the space in which you want to work:

In [16]:
#result = ! cpdctl space list --output json -j "(resources[].metadata.id)[0]" --raw-output
#space_id = result.s
#print("space id: {}".format(space_id))

# You can also specify your space id directly:
space_id = "f2124828-0125-45e8-8dda-e8b8ad142fb6"

## 2. Demo 1: Creating a notebook asset and running a job <a class="anchor" id="part2"></a>

Before starting with this section, ensure that you have run the cells in [Section 1](#part1) and specified the ID of the space in which you will work.

Suppose you have a Jupyter Notebook (.ipynb) file on your local system and you would like to run the code in the file as a job on a CPD cluster. This section shows you how to create a notebook asset and run a job on a CPD cluster. 

### 2.1 Creating a notebook asset<a class="anchor" id="part2.1"></a>

First of all, you need to create a notebook asset in your space. To create a notebook asset you need to specify:
- The environment in which your notebook is to run
- A notebook file (.ipynb).

List all the environments in your space, filter them by their display name and get the ID of the environment in which your notebook will be run:

In [17]:
environment_name = "Default Python 3.8"
query_string = "(resources[?entity.environment.display_name == '{}'].metadata.asset_id)[0]".format(environment_name)

In [18]:
result = ! cpdctl environment list --space-id {space_id} --output json -j "{query_string}" --raw-output
env_id = result.s
print("environment id: {}".format(env_id))

# You can also specify your environment id directly:
# env_id = "Your environment ID"

environment id: jupconda38-f2124828-0125-45e8-8dda-e8b8ad142fb6


Upload the .ipynb file:

In [25]:
remote_file_path = "notebook/cpdctl-test-notebook.ipynb"
local_file_path = "cpdctl-test-notebook.ipynb"

In [26]:
! cpdctl asset file upload --path {remote_file_path} --file {local_file_path} --space-id {space_id}

...
[31;1mFAILED[0m
Error opening file 'cpdctl-test-notebook.ipynb':
open cpdctl-test-notebook.ipynb: no such file or directory



Create a notebook asset:

In [21]:
file_name = "cpdctl-test-notebook.ipynb"
runtime = {
    'environment': env_id
}
runtime_json = json.dumps(runtime)

In [22]:
result = ! cpdctl notebook create --file-reference {remote_file_path} --name {file_name} --space {space_id} --runtime '{runtime_json}' --output json -j "metadata.asset_id" --raw-output
notebook_id = result.s
print("notebook id: {}".format(notebook_id))

notebook id: FAILED   trace                                  code                                              message    f6cb96d5-02ba-4910-9344-cebe578d5e9a   asset_attachment_create_prerequisites_not_found   Could not create attachment for notebook asset. CAMS error code: dependent_service_error. CAMS error message: NGPDL4108E: Object does not exist. key='notebook/cpdctl-test-notebook.ipynb'   


### 2.2 Running a job <a class="anchor" id="part2.2"></a>

To create a notebook job, you need to give your job a name, add a description, and pass the notebook ID and environment ID you determined in [2.1](#part2.1). Additionally, you can add environment variables that will be used in your notebook:

In [28]:
job_name = "cpdctl-test-job"
job = {
    'asset_ref': notebook_id, 
    'configuration': {
        'env_id': env_id, 
        'env_variables': [
            'foo=1', 
            'bar=2'
        ]
    }, 
    'description': 'my job', 
    'name': job_name
}
job_json = json.dumps(job)

In [29]:
result = ! cpdctl job create --job '{job_json}' --space-id {space_id} --output json -j "metadata.asset_id" --raw-output
job_id = result.s
print("job id: {}".format(job_id))

job id: FAILED                  Code:      404    Message:   An error occurred while retrieving the asset FAILED   trace                                  code                                              message    f6cb96d5-02ba-4910-9344-cebe578d5e9a   asset_attachment_create_prerequisites_not_found   Could not create attachment for notebook asset. CAMS error code: dependent_service_error. CAMS error message: NGPDL4108E: Object does not exist. key=notebook/cpdctl-test-notebook.ipynb   -1645741334933.    Error:     Not Found    Reason:    HTTP 404 Not Found   


Run a notebook job:

In [30]:
job_run = {
    'configuration': {
        'env_variables': [
            'key1=value1', 
            'key2=value2'
        ]
    }
}
job_run_json = json.dumps(job_run)

In [31]:
result = ! cpdctl job run create --space-id {space_id} --job-id {job_id} --job-run '{job_run_json}' --output json -j "metadata.asset_id" --raw-output
run_id = result.s
print("run id: {}".format(run_id))

run id: Error: unknown shorthand flag: '1' in -1645741334933. Usage:   cpdctl job run create --job-id JOB-ID --job-run JOB-RUN [--project-id PROJECT-ID] [--space-id SPACE-ID]  Flags:       --async               Run the command asynchronously. By default it waits for the processing to finish.       --cpd-scope string    CPD space or project scope, e.g. 'cpd://default-context/spaces/7bccdda4-9752-4f37-868e-891de6c48135'   -h, --help                help for create       --job string          Job definition in the same format as in 'job create' command. If set, the job is created prior to starting the job run unless a job with the specified name already exists. The flags --job and -job-id are mutually exclusive.       --job-id string       The ID of the job to use. Each job has a unique ID.       --job-run string      Configuration for the job run.       --project-id string   The ID of the project to use. project_id or space_id is required.       --space-id string     The ID of the space t

You can see the output of each cell in your .ipynb file by listing job run logs:

In [23]:
! cpdctl job run logs --job-id {job_id} --run-id {run_id} --space-id {space_id}

...

Cell 1:
0
1
4
9
16




## 3. Demo 2: Creating a code package asset and running a job <a class="anchor" id="part3"></a>

Before starting with this section, ensure that you have run the cells in [Section 1](#part1) and specified the ID of the space in which you will work.

A code package is a way of organizing a set of dependent files in a folder structure. For example, a code package can contain a notebook file that calls other notebook files or functions in script files.

Suppose you have a ZIP file of this folder structure on your local system and would like to run the code in the folder as a job on a CPD cluster. This section shows you how to create and register a code package asset in a deployment space and run the files in the code package asset as a job.

### 3.1 Creating a code package asset<a class="anchor" id="part3.1"></a>

Upload the .zip file:

In [32]:
remote_file_path = "code_package/cpdctl-test-code-package.zip"
local_file_path = "cpdctl-test-code-package.zip"

In [33]:
! cpdctl asset file upload --path {remote_file_path} --file {local_file_path} --space-id {space_id}

...
[31;1mFAILED[0m
Error opening file 'cpdctl-test-code-package.zip':
open cpdctl-test-code-package.zip: no such file or directory



Create a code package asset. The code package asset has the same name as the ZIP file.

In [39]:
os.environ["CPDCTL_ENABLE_CODE_PACKAGE"] = "true"

In [41]:
file_name = "cpdctl-test-code-package.zip"

In [43]:
result = ! cpdctl code-package create --file-reference {remote_file_path} --name {file_name} --space-id {space_id} --output json -j "metadata.asset_id" --raw-output
code_package_id = result.s
print("code package id: {}".format(code_package_id))

code package id: 5b81c35c-4ccd-4090-993e-62cf5f6dd6d7


### 3.2 Running a job <a class="anchor" id="part3.2"></a>

List all the environments in your space, filter them by their display name and get the ID of the environment in which your code package will be run:

In [44]:
environment_name = "Default Python 3.8"
query_string = "(resources[?entity.environment.display_name == '{}'].metadata.asset_id)[0]".format(environment_name)

In [45]:
result = ! cpdctl environment list --space-id {space_id} --output json -j "{query_string}" --raw-output
env_id = result.s
print("environment id: {}".format(env_id))

# You can also specify your environment id directly:
# env_id = "Your environment ID"

environment id: jupconda38-0f9bb565-a7d8-409b-baaf-5a56cd343155


To create a code package job, you need to give your job a name, add a description, set an entrypoint and pass the code package ID and the environment ID. Additionally, you can add environment variables that will be used in your notebook:

In [46]:
job_name = "cpdctl-test-code-package-job"
job = {
    'asset_ref': code_package_id, 
    'configuration': {
        'env_id': env_id, 
        'env_variables': [
            'foo=1', 
            'bar=2'
        ],
        'entrypoint': "test.ipynb"
    }, 
    'description': 'my code package job', 
    'name': job_name
}
job_json = json.dumps(job)

In [47]:
result = ! cpdctl job create --job '{job_json}' --space-id {space_id} --output json -j "metadata.asset_id" --raw-output
job_id = result.s
print("job id: {}".format(job_id))

job id: 6506e20b-33a6-4e80-9468-4982b046fb5d


Run a code packge job:

In [48]:
job_run = {
    'configuration': {
        'env_variables': [
            'key1=value1', 
            'key2=value2'
        ]
    }
}
job_run_json = json.dumps(job_run)

In [49]:
result = ! cpdctl job run create --space-id {space_id} --job-id {job_id} --job-run '{job_run_json}' --output json -j "metadata.asset_id" --raw-output
run_id = result.s
print("run id: {}".format(run_id))

run id: 42c620cb-834a-46d2-b262-1a15d77fc687


You can see the output of each cell in your .ipynb file by listing job run logs:

In [50]:
! cpdctl job run logs --job-id {job_id} --run-id {run_id} --space-id {space_id}

...

Cell 1:
0
1
2
3
4




## 4. Demo 3: Promoting a notebook from a project to a space <a class="anchor" id="part4"></a>

Before starting with this section, ensure that you have run the cells in [Section 1](#part1) and specified the ID of the space in which you will work.

Suppose you have a notebook in a project and would like to promote a specific version of this notebook to a space. This section shows you how to promote a notebook from a project to a space on a CPD cluster.

Choose a project from which you will promote your notebook:

In [13]:
result = ! cpdctl project list --output json -j "(resources[].metadata.guid)[0]" --raw-output
project_id = result.s
print("project id: {}".format(project_id))

# You can also specify your project id directly:
# project_id = "Your project ID"

project id: f1be1dbd-fd40-4c0e-8c08-9d953b1d6750


Specify the notebook you would like to promote:

In [14]:
result = ! cpdctl asset search --type-name notebook --query "asset.asset_type:notebook" --project-id {project_id} --output json -j "(results[].metadata.asset_id)[0]" --raw-output
notebook_id_in_project = result.s
print("notebook id in project: {}".format(notebook_id_in_project))

# You can also specify your notebook id in project directly:
# notebook_id_in_project = "Your notebook ID in project"

notebook id in project: 9240ebc4-19d0-49ea-82cc-4d7a58454468


Create a version for your notebook if it has not any version and get its corresponding revision id:

In [15]:
result = ! cpdctl notebook version create --notebook-id {notebook_id_in_project} --output json -j "entity.rev_id" --raw-output
revision_id = result.s
print("revision id: {}".format(revision_id))

revision id: 1


Or specify an existing revision of the notebook:

In [59]:
result = ! cpdctl notebook version list --notebook-id {notebook_id_in_project} --output json -j "(resources[].entity.rev_id)[0]" --raw-output
revision_id = result.s
print("revision id: {}".format(revision_id))

# You can also specify your revision id directly:
# revision_id = "Your revision ID"

revision id: 7


Promote the notebook to the space. The parameters `name` and `description` are optional. If they are not specified, the name and description of the original notebook in the project will be used.

In [16]:
notebook_name = "cpdctl_test_promote"
notebook_description = "cpdctl test promote"
request_body = {
    'space_id': space_id,
    'metadata': {
        'name': notebook_name,
        'description': notebook_description
    }
}
request_body = json.dumps(request_body)

In [17]:
result = ! cpdctl asset promote --asset-id {notebook_id_in_project} --revision-id {revision_id} --project-id {project_id} --request-body '{request_body}'
# verify that the notebook has been promoted into the space
result = ! cpdctl asset search --space-id {space_id} --type-name notebook --query asset.name:{notebook_name} --output json -j "(results[].metadata.asset_id)" --raw-output
notebook_id_in_space = result.s
print("notebook id in space: {}".format(notebook_id_in_space))

notebook id in space: [   "3062c842-9fba-4ee7-95e8-2fe926d075f7" ]


Copyright © 2021 IBM. This notebook and its source code are released under the terms of the MIT License.