# CML API Example Notebook

This notebook demonstrates the core functionality of the CML API. It uses the CML API Python client to make requests to the API service and operate on the responses.

Running this notebook will create and delete projects and jobs. It will launch job runs, applications, and model deployments. If the notebook terminates before completing all the cells, some resources may linger and need to be manually terminated/removed. Rerunning cells that create resources may lead to additional resources being created.

In [None]:
# Install cmlapi package
try:
    import cmlapi
except ModuleNotFoundError:
    import os
    cluster = os.getenv("CDSW_API_URL")[:-1]+"2"
    !pip3 install {cluster}/python.tar.gz
    import cmlapi

from cmlapi.utils import Cursor
import string
import random
import json

try:
    client = cmlapi.default_client()
except ValueError:
    print("Could not create a client. If this code is not being run in a CML session, please include the keyword arguments \"url\" and \"cml_api_key\".")

In [None]:
session_id = "".join([random.choice(string.ascii_lowercase) for _ in range(6)])
session_id

## Note
- In addition to using python client, **_curl_** is also supported.  
`> curl -X GET -H "Authorization: Bearer ${APIKEY}" "https://${CML_DOMAIN}/api/v2/projects" | python -m json.tool`

- The returned objects are _not_ Python dictionaries, so you can't use `object[field]` to reference the fields of the response. Instead, you can use `object.field` to reference the properties of response objects.

- All the list_XXX endpoints share the same arguments, particularly search_filter and sort. Examples are provided in [**list_projects**](#list_projects).

- In the following examples, we will use legacy engines. However, the usage of legacy engines is similar to the usage of runtimes.

## Create Project
- CreateProjectRequest

Argument | Type | Description | Notes
------------ | ------------- | ------------- | -------------
**name** | **str** | The name of the project to create. | 
**description** | **str** | The description of the project. | \[optional\] 
**visibility** | **str** | The visibility of the project (one of &quot;public&quot;, &quot;organization&quot;, &quot;private&quot;). Default is private. | \[optional\] 
**parent_project** | **str** | Optional parent project to fork. | \[optional\] 
**git_url** | **str** | Optional git URL to checkout for this project. | \[optional\] 
**template** | **str** | Optional template to use (Python, R, PySpark, Scala, Churn Predictor, local, git, blank) Note: local will create the project but nothing else, files must be uploaded separately. |
**organization_permission** | **str** | If this is an organization-wide project, the visibility to others in the organization. | \[optional\] 
**default_project_engine_type** | **str** | Whether this project uses legacy engines or runtimes. Valid values are &quot;ml_runtime&quot;, &quot;legacy_engine&quot;, or leave blank to default to the site-wide default. | \[optional\]
**environment** | **dict(str, str)** | The default set of environment variables to run | \[optional\] 
**shared_memory_limit** | **int** | Additional shared memory limit that engines in this project should have, in MB (default 64). | \[optional\] 

In [None]:
body = cmlapi.CreateProjectRequest(
    name = "demo_"+session_id,
    description = "A demo project created using the CML public API",
    default_project_engine_type = "ml_runtime",
    template = "Python")
# Create the project
project = client.create_project(body)
project_id = project.id

In [None]:
project

<a id='list_projects'></a>
## List/Get Project
The list_projects API call takes the following parameters:

Argument | Type | Description  | Notes
------------- | ------------- | ------------- | -------------
 **search_filter** | **str**| Search filter is an optional HTTP parameter to filter results by. Supported search filter keys are: \[creator.email creator.name creator.username description name owner.email owner.name owner.username\]. For example:   search_filter={"name":"foo","creator.name":"bar"},. | [optional] 
 **sort** | **str**| Sort is an optional HTTP parameter to sort results by. Supported sort keys are: \[created_at creator.email creator.name creator.username description name owner.email owner.name owner.username updated_at\]. where \&quot;+\&quot; means sort by ascending order, and \&quot;-\&quot; means sort by descending order. For example:   sort&#x3D;-updated_at,+name. | [optional] 
 **page_size** | **int**| Page size is an optional argument for number of entries to return in one page. If not specified, the server will determine a page size. If specified, must be respecified for further requests when using the provided next page token in the response. | [optional] 
 **page_token** | **str**| Page token is an optional argument for specifying which page of results to get. If not specified, the first page will be returned, including a token for the next page. Will be empty if there is no next page. | [optional] 
 **include_public_projects** | **bool**| Default is false. If include_public_projects is set to true, then it will return all projects user has access to, including public projects. | [optional] 

In [None]:
# List projects using the default sort and default page size (10)
client.list_projects()

In [None]:
# You can add search filters for project fields to filter for projects that have the filter present in the specified field. The following will filter for projects whose names contain the word "demo".
search_filters = {
    "name": "demo",
}

# List 5 projects that the user has direct read access to (not including general public projects or organization-level projects that the user does not have a specific permission for), sorted ascendingly by when they were updated.
projects_list = client.list_projects(
    page_size = 5,
    search_filter = json.dumps(search_filters),
    sort="updated_at"
)
projects_list
# If there are more than 5 such projects, fetch the next 5 using the page_token. The same keyword parameters MUST be included.
if projects_list.next_page_token != "":
    next_page_projects_list = client.list_projects(
    page_size = 5,
    search_filter = json.dumps(search_filters),
    sort="updated_at",
    page_token = projects_list.next_page_token
)
# Fetch the first 5 projects sorted by updated_at in descending order.
client.list_projects(
    page_size = 5,
    search_filter = json.dumps(search_filters),
    sort="-updated_at"
)

In [None]:
# Get a specific project given its ID.
client.get_project(project_id = project_id)

## Update project

The following project fields can be updated.
 - name
 - description
 - project_visibility
 - default_project_engine_type
 - shared_memory_limit
 - environment

The update can be provided as a Project object with the updated fields set to their new values and all other fields set to None, or as a dictionary mapping each field to its new value.


In [None]:
update_body = cmlapi.Project(
    name = "updated_" + project.name,
    visibility = "public"
)
client.update_project(update_body, project_id)

## List Runtimes
When using runtimes, you need to specify which runtime to use. You can list out runtimes to get their identifiers, and include one in your request to use it. You can ignore the field if you are using legacy engines.

Like projects, runtimes can be filtered. You can filter runtimes on the following fields:
- image_identifier
- editor
- kernel
- edition
- description
- full_version


In [None]:
client.list_runtimes()

In [None]:
# filter on Standard edition Python 3.9 runtimes using the Workbench editor
py39_standard_runtimes = client.list_runtimes(search_filter=json.dumps({
     "kernel": "Python 3.9",
     "edition": "Standard",
    "editor" : "Workbench"
}))

print(py39_standard_runtimes)

# save image identifier for later
py39_standard_runtime_image_identifier = py39_standard_runtimes.runtimes[0].image_identifier
print("Image identifier of the selected Python 3.9 Standard runtime: ", py39_standard_runtime_image_identifier)

## Cursor helper
This helper works for any endpoint with _list_ (list_projects, list_jobs, list_runtimes, ...)

Cursor returns an iterable objects.

In [None]:
# cursor also supports search_filter
# cursor = Cursor(client.list_runtimes, 
#                 search_filter = json.dumps({"image_identifier":"jupyter"}))
cursor = Cursor(client.list_runtimes)
runtimes = cursor.items()
for rt in runtimes:
    print(rt.image_identifier)

## Create Job
#### CreateJobRequest

Argument | Type | Description | Notes
------------ | ------------- | ------------- | -------------
**project_id** | **str** | ID of the project containing the job. |
**name** | **str** | Name of the new job. | 
**script** | **str** | The script to run for the new job. | 
**cpu** | **float** | CPU cores to allocate to job runs for this job (default 1). | \[optional\] 
**memory** | **float** | Memory in GB to allocate to job runs for this job (default 1). | \[optional\] 
**nvidia_gpu** | **int** | Number of Nvidia GPUs to allocate to this job (default 0). | \[optional\] 
**parent_job_id** | **str** | Optional dependent job if this new job is a dependency. Setting this to a parent job will make this job run when the parent job completes. Cannot be used alongside \&quot;schedule\&quot;. | \[optional\] 
**environment** | **dict(str, str)** | Default environment variables to include in job runs for this job. | \[optional\] 
**arguments** | **str** |  | \[optional\] 
**timeout** | **int** | Timeout in seconds of job runs for this job. | \[optional\] 
**schedule** | **str** | Schedule to run a job automatically. Cannot be used in a dependency job. Follows the cron format. For example, to execute the job every Monday at 1 PM UTC, the schedule would be \&quot;0 13 * * 1\&quot; without quotes. | \[optional\] 
**kernel** | **str** | Kernel to run the job runs on. Possible values are python3, python2, r, or scala. Should not be set if the project uses ML Runtimes. | \[optional\] 
**recipients** | **list\[JobRecipient\]** | An optional list of recipients to receive notifications for job events such as successful runs, failures, and manual stops. | \[optional\] 
**attachments** | **list\[str\]** | Files to attach (with path relative to /home/cdsw/) in notification emails. For example, to attach a file located at /home/cdsw/report/result.csv, include \&quot;report/result.csv\&quot; in the array for this field. | \[optional\] 
**runtime_identifier** | **str** | The runtime image identifier to use if this job is part of a ML Runtime project. Must be set if using ML Runtimes. | \[optional\] 

In [None]:
# Create a job. We will create dependent/children jobs of this job, so we call this one a "grandparent job". The parameter "runtime_identifier" is needed if this is running in a runtimes project.
grandparent_job_body = cmlapi.CreateJobRequest(
    project_id = project_id,
    name = "grandparentJob",
    script = "analysis.py",
    runtime_identifier = py39_standard_runtime_image_identifier,
)
# Create this job within the project specified by the project_id parameter.
grandparent_job = client.create_job(grandparent_job_body, project_id)

### Create dependent jobs
When a parent job is started, its child/dependent jobs  will automatically start after it successfully completes. 

In [None]:
# Create a dependent job by specifying the parent job's ID in the parent_job_id field.
parent_job_body = cmlapi.CreateJobRequest(
    project_id = project_id,
    name = "parentJob",
    script = "analysis.py",
    runtime_identifier = py39_standard_runtime_image_identifier,
    parent_job_id = grandparent_job.id
)
parent_job = client.create_job(parent_job_body, project_id)

In [None]:
# Create a job that is dependent on the job from the previous cell. This leads to a dependency chain of grandparent_job -> parent_job -> child_job. If grantparent_job runs and succeeds, then parent_job will trigger, and if parent_job runs and succeeds, child_job will trigger. This one uses a template script that does not terminate, so we'll have the opportunity to try stopping it later.
child_job_body = cmlapi.CreateJobRequest(
    project_id = project_id,
    name = "childJob",
    script = "entry.py",
    runtime_identifier = py39_standard_runtime_image_identifier,
    parent_job_id = parent_job.id
)
child_job = client.create_job(child_job_body, project_id)

## List/Get Job

In [None]:
# This will list jobs in the project. By default it will list the first 10, and provide a next_page_token to list more if there are any. This behavior can be controlled by adding the keyword argument "page_size".
joblists = client.list_jobs(project_id = project_id)
print(f'Fetched {len(joblists.jobs)} jobs from the project')

In [None]:
# Get a specific job given the project and job ID.
client.get_job(project_id = project_id, job_id = parent_job.id)

In [None]:
# Get all parent jobs 
curJob = child_job.id
while len(curJob)>0:
    job = client.get_job(project_id = project_id, job_id = curJob)
    print('Job ID:   {}\n  name:   {}\n  script: {}'.format(job.id, job.name, job.script))
    curJob = job.parent_id


## Update Job

When updating a job, you can modify the following fields.

- schedule
- parent_id
- name
- timeout
- cpu
- memory
- nvidia_gpu
- environment

As with projects, you can submit a modification either with a `cmlapi.Job` object where all fields to be updated are set to their updated value, and all other fields are set to None (their default value when creating a new `cmlapi.Job`). Alternatively you can use a dictionary mapping only the fields that are being updated to their new values.

In [None]:
update_body = cmlapi.Job(name = "updated_"+ parent_job.name)
client.update_job(update_body, project_id, parent_job.id)

## Create JobRun

#### CreateJobRunRequest
Name | Type | Description | Notes
------------ | ------------- | ------------- | -------------
**project_id** | **str** | ID of the project containing the job. | 
**job_id** | **str** | The job ID to create a new job run for. | 
**environment** | **dict(str, str)** | The environment variables to include in this run. | \[optional\] 
**arguments** | **str** | The custom arguments to the job run | \[optional\] 

In [None]:
# Create a job run for the specified job.
# If the job has dependent jobs, the dependent jobs will run after the job succeeds.
# In this case, the grandparent job will run first, then the parent job, and then the child job, provided each job run succeeds.
jobrun_body = cmlapi.CreateJobRunRequest(project_id, grandparent_job.id)
job_run = client.create_job_run(jobrun_body, project_id, grandparent_job.id)
run_id = job_run.id

In [None]:
job_run

## List/Get JobRun



In [None]:
# Get a job run given its ID, as well as the job ID and project ID containing the job run.
client.get_job_run(project_id, grandparent_job.id, run_id)

In [None]:
# List all job runs in a job (pursuant to page_size, default 10).
job_runs = client.list_job_runs(project_id, child_job.id)

## Stop JobRun

In [None]:
# Stop a job run. This will stop whatever the job run is doing and terminate the engine.
# We don't know which job run it's on (since grandparent, parent, and child are all running). So, we will try to stop all of them, and ignore errors that arise if they are already stopped.
# Since each job can only have at most one active run, we only need to check the most recent job run for each job.
for job in [grandparent_job, parent_job, child_job]:
    job_runs = client.list_job_runs(project_id, job.id, sort="-created_at", page_size=1)
    if len(job_runs.job_runs) == 1:
        job_run = job_runs.job_runs[0]
        try:
            client.stop_job_run(project_id, child_job.id, job_run.id)
        except cmlapi.rest.ApiException:
            pass

## Create Application

#### CreateApplicationRequest
Argument | Type | Description | Notes
------------ | ------------- | ------------- | -------------
**project_id** | **str** | The project's identifier | 
**name** | **str** | Name of the new application. |  
**subdomain** | **str** | The subdomain of the application. The application will be served at the URL http(s)://subdomain.<domain> | 
**description** | **str** | The description of the application. | [optional] 
**script** | **str** | The script to run for the new application. | 
**cpu** | **float** | CPU cores to allocate to application (default 1). | [optional] 
**memory** | **float** | Memory in GB to allocate to application (default 1). | [optional] 
**nvidia_gpu** | **int** | Number of Nvidia GPUs to allocate to this application (default 0). | [optional] 
**environment** | **dict(str, str)** | Default environment variables to include in application. | [optional] 
**kernel** | **str** | Kernel to run the job runs on. Possible values are python3, python2, r, or scala. | [optional] 
**bypass_authentication** | **bool** | Enable unauthenticated access to application | [optional] 

Creating application also starts application implicitly.

In [None]:
# This creates a simple application. If using runtimes, the runtime_identifier must be specified.
application_request = cmlapi.CreateApplicationRequest(
    name = "demo_app_"+session_id,
    description = "A sample application to demonstrate CML APIs",
    project_id = project_id,
    subdomain = "demo-"+session_id,
    runtime_identifier = py39_standard_runtime_image_identifier,
    script = "entry.py",
)
app = client.create_application(
    project_id = project_id,
    body = application_request
)

## List/Get Applications

Applications can be listed using the same mechanisms (sort, search_filter, page_size, etc.) as the other resources we've seen so far. Applications can be filtered on the following properties:
- creator.email
- creator.name
- creator.username
- description
- full_name
- name
- script
- subdomain
- status
- kernel
- bypass_authentication
- runtime_identifier

Applications can also be sorted on the following properties:
- created_at
- creator.email
- creator.name
- creator.username
- description
- name
- kernel
- script
- updated_at
- status
- runtime_identifier


In [None]:
# You can list applications similarly to other resources.
client.list_applications(project_id = project_id)

## Update Application

When updating an application, you can modify the following fields:
- name
- subdomain
- description
- script
- bypass_authentication
- kernel
- cpu
- memory
- nvidia_gpu
- environment

Modifying these fields can be done similarly to how we updated projects and jobs earlier.

In [None]:
update_application_req = cmlapi.Application(
    name = "updated_" + app.name,
    subdomain = "updated-" + app.subdomain,
    description = "updated_" + app.description,
    environment = json.dumps({"UPDATED_ENV": "UPDATED_ENV_VALUE"}),
)
updated_application = client.update_application(
    update_application_req,
    project_id = project_id,
    application_id = app.id
)
updated_application

## Model

Deploying a model is a three-step process. First, create a model with some basic information. Second, build the model by specifying the file to use and function to run. Finally, deploy the model with some allocated resources.

1. CreateModel

#### CreateModelRequest

Argument | Type | Description | Notes
------------ | ------------- | ------------- | -------------
**project_id** | **str** | ID of the project containing the model. |  
**name** | **str** | Name of the model. |  
**description** | **str** | Description of the model. | \[optional\] 
**disable_authentication** | **bool** | Whether to disable authentication for requests to deployments of this model. | \[optional\] 

2. CreateModelBuild

#### CreateModelBuildRequest

Argument | Type | Description | Notes
------------ | ------------- | ------------- | -------------
**project_id** | **str** | ID of the project containing the model build. |
**model_id** | **str** | The ID of the model that will the build. | 
**comment** | **str** | A comment associated with the build. | \[optional\] 
**file_path** | **str** | The path to the file to build. | 
**function_name** | **str** | The function name to run when executing the build. |  
**kernel** | **str** | The kernel the model build should use. | 
**runtime_identifier** | **str** | The runtime ID the model build should use. | 


3. CreateModelDeployment

#### CreateModelDeploymentRequest

Argument | Type | Description | Notes
------------ | ------------- | ------------- | -------------
**project_id** | **str** | ID of the project containing the model. | 
**model_id** | **str** | ID of the model to deploy. | 
**build_id** | **str** | ID of the model build to deploy. | 
**cpu** | **float** | Number of vCPU to allocate to the deployment. | \[optional\] 
**memory** | **float** | Amount of memory in GB to allocate to the deployment. | \[optional\] 
**nvidia_gpus** | **int** | Number of nvidia GPUs to allocate to the deployment. | \[optional\] 
**environment** | **dict(str, str)** | Environment variables to run the deployment with. | \[optional\] 

In [None]:
modelReq = cmlapi.CreateModelRequest(
    name = "demo-model-" + session_id,
    description = "model created for demo",
    project_id = project_id,
)
model = client.create_model(modelReq, project_id)

In [None]:
model_build_request = cmlapi.CreateModelBuildRequest(
    project_id = project_id,
    model_id = model.id,
    comment = "test comment",
    file_path = "pi.py",
    function_name = "predict",
    runtime_identifier = py39_standard_runtime_image_identifier,
)
modelBuild = client.create_model_build(
    model_build_request, project_id, model.id
)

In [None]:
model_deployment = cmlapi.CreateModelDeploymentRequest(
        project_id = project_id, 
        model_id = model.id, 
        build_id = modelBuild.id
    )
model_deployment_response = client.create_model_deployment(
        model_deployment, 
        project_id = project_id, 
        model_id = model.id, 
        build_id = modelBuild.id
    )

## Get/List models

In [None]:
client.get_model(project_id = project_id, model_id = model.id)

In [None]:
client.list_models(project_id = project_id)

## Get/List model_builds

In [None]:
client.get_model_build(
    project_id = project_id,
    model_id = model.id,
    build_id = modelBuild.id
)

In [None]:
client.list_model_builds(project_id = project_id, model_id = model.id)

## Get/List model_deployments


In [None]:
client.get_model_deployment(project_id = project_id, model_id = model.id, build_id = modelBuild.id, deployment_id=model_deployment_response.id)

In [None]:
client.list_model_deployments(
    project_id = project_id,
    model_id = model.id,
    build_id = modelBuild.id
)

## Stop model deployment

In [None]:
client.stop_model_deployment(
    project_id = project_id,
    model_id = model.id,
    build_id = modelBuild.id,
    deployment_id = model_deployment_response.id
)

## Deleting resources

In [None]:
# Deleting job does not delete its dependent jobs
client.delete_job(project_id = project_id, job_id = parent_job.id)

In [None]:
client.delete_application(project_id = project_id, application_id = app.id)

In [None]:
# Uncomment the following lines to delete the project
# client.delete_project(project_id)

**_If this documentation includes code, including but not limited to, code examples, Cloudera makes this available to you under the terms of the Apache License, Version 2.0, including any required notices. A copy of the Apache License Version 2.0 can be found in LICENSE.txt of this repository._**