# About this Jupyter Notebook

@author: Yingding Wang

This notebook demonstrates how to mount a volume to the pipeline

### Useful JupyterLab Basic

Before start, you may consider to update the jupyterlab with the command

<code>python
!{sys.executable} -m pip install --upgrade --user jupyterlab    
</code>  

1. Autocomplete syntax with "Tab"
2. View Doc String with "Shift + Tab"
3. mark the code snippet -> select with right mouse -> Show Contextual Help (see the function code)

In [1]:
import sys, os
print(f"Sys version: {sys.version}")

Sys version: 3.8.10 | packaged by conda-forge | (default, May 11 2021, 07:01:05) 
[GCC 9.3.0]


In [2]:
# uncomment to install the kfp
!{sys.executable} -m pip install --upgrade --user kfp==1.8.14



# (optional step) Upgrade juypterlab
Run the follow cell to upgrade the jupyterlab server and you need to restart the notebook to see effect

In [3]:
!{sys.executable} -m pip show jupyterlab # 3.4.3
# !{sys.executable} -m pip show jupyter_contrib_nbextensions

Name: jupyterlab
Version: 3.4.3
Summary: JupyterLab computational environment
Home-page: https://jupyter.org
Author: Jupyter Development Team
Author-email: jupyter@googlegroups.com
License: 
Location: /opt/conda/lib/python3.8/site-packages
Requires: ipython, jinja2, jupyter-core, jupyter-server, jupyterlab-server, nbclassic, packaging, tornado
Required-by: 


In [4]:
# update the jupyter lab
#!{sys.executable} -m pip install --upgrade --user jupyterlab==3.4.3

In [5]:
# update the jupyter lab
# uncomment the following command to update juypterlab
# !{sys.executable} -m pip install --upgrade --user jupyterlab==3.4.3

## (optional) Restart your notebook server in Kubeflow by stop and start

(optional) Upgrade your kfp-server-api

Should you see a headtoken issue while starting the kubeflow pipeline from your notebook. Please execute this optional step to upgrade the kfp-server-api to match the kubeflow pipeline backend


In [6]:
# show the kfp-server-api version, 1.7.0 for kf 1.4, 1.7.1 for kf 1.4.1 and 1.8.1 for kf 1.5.0
!{sys.executable} -m pip list | grep kfp

kfp                      1.8.14
kfp-pipeline-spec        0.1.16
kfp-server-api           1.8.5


In [7]:
"""upgrade the kfp server api version to 1.7.0 for KF 1.4"""
# !{sys.executable} -m pip uninstall -y kfp-server-api
# !{sys.executable} -m pip install --user --upgrade kfp-server-api==1.7.0
"""upgrade the kfp server api version to 1.8.1 for KF 1.5"""
# !{sys.executable} -m pip uninstall -y kfp-server-api
# !{sys.executable} -m pip install --user --upgrade kfp-server-api==1.8.1

'upgrade the kfp server api version to 1.8.1 for KF 1.5'

In [8]:
# import sys
# !{sys.executable} -m pip install --upgrade --user kfp==1.8.12
# !{sys.executable} -m pip install --upgrade --user kubernetes==18.20.0

# Restart the kernal
After update the kfp, restart this notebook kernel

Jupyter notebook: Meun -> Kernel -> restart kernel

### Check my KubeFlow namespace total resource limits

In [9]:
# run command line to see the quota
!kubectl describe quota

Name:                                                                 kf-resource-quota
Namespace:                                                            kubeflow-kindfor
Resource                                                              Used    Hard
--------                                                              ----    ----
cpu                                                                   3090m   36
csi-s3.storageclass.storage.k8s.io/persistentvolumeclaims             0       10
csi-s3.storageclass.storage.k8s.io/requests.storage                   0       2Ti
kubeflow-nfs-csi.storageclass.storage.k8s.io/persistentvolumeclaims   2       20
kubeflow-nfs-csi.storageclass.storage.k8s.io/requests.storage         20Gi    4Ti
memory                                                                4486Mi  520Gi
microk8s-hostpath.storageclass.storage.k8s.io/persistentvolumeclaims  0       5
microk8s-hostpath.storageclass.storage.k8s.io/requests.storage        0       20Gi
minio

## Setup
Example Pipeline from

https://github.com/kubeflow/examples/tree/master/pipelines/simple-notebook-pipeline

## Getting started with Python function-based components

https://www.kubeflow.org/docs/components/pipelines/sdk/python-function-components/

In [10]:
from platform import python_version

EXPERIMENT_NAME = 'kf v1.5 test'        # Name of the experiment in the UI
EXPERIMENT_DESC = 'testing KF platform'
# BASE_IMAGE = f"library/python:{python_version()}" # Base image used for components in the pipeline, which has not root
BASE_IMAGE = "python:3.8.13"
NAME_SPACE = "kubeflow-kindfor" # change namespace if necessary

In [11]:
import kfp
import kubernetes
import kfp.dsl as dsl
import kfp.compiler as compiler
import kfp.components as components
from kfp.dsl import PipelineVolume

## Connecting KFP Python SDK from Notebook to Pipeline

* https://www.kubeflow.org/docs/components/pipelines/sdk/connect-api/

In [12]:
print(kfp.__version__)
print(kubernetes.__version__)

1.8.14
12.0.1


### Create component from function

https://kubeflow-pipelines.readthedocs.io/en/latest/source/kfp.components.html


In [13]:
from kfp.components import create_component_from_func
from functools import partial
@partial(
    create_component_from_func,
    output_component_file='add_component.yaml',
    base_image=BASE_IMAGE,
    packages_to_install=None # can't use [""] as requiremnt, either None or ["pandas"], 
)
def add_op(a: float, b: float) -> float:
    '''Calculates sum of two arguments'''
    print(a, '+', b, '=', a + b)
    return a + b

In [14]:
@components.create_component_from_func
def write_to_volume(sum: float):
    with open("/mnt/file.txt", "w") as file:
        file.write(f"The total sum is: {sum}")

## Define function to restrict ContainerOp resource for kfp with multi-tenancy¶

Add memory and cpu restriction: 
* https://github.com/kubeflow/pipelines/pull/5695

In [15]:
from kfp.dsl import ContainerOp
def pod_resource_transformer(op: ContainerOp, mem_req="200Mi", cpu_req="2000m", mem_lim="1000Mi", cpu_lim='2000m'):
    """
    op.set_memory_limit('1000Mi') = 1GB
    op.set_cpu_limit('1000m') = 1 cpu core
    """
    return op.set_memory_request(mem_req)\
            .set_memory_limit(mem_lim)\
            .set_cpu_request(cpu_req)\
            .set_cpu_limit(cpu_lim)

### volume example
https://www.kubeflow.org/docs/components/pipelines/sdk/manipulate-resources/#persistent-volume-claims-pvcs

https://github.com/kubeflow/pipelines/blob/sdk/release-1.8/samples/core/volume_ops/volume_ops.py@components.create_component_from_func
def write_to_volume():
    with open("/mnt/file.txt", "w") as file:
        file.write("Hello world")

In [16]:
@dsl.pipeline(
   name='Calculation pipeline', 
   description='A toy pipeline that performs arithmetic calculations.'
)
def calc_pipeline(
   a: float =0,
   b: float =7
):  
    # get a volume
    single_cell_volume = PipelineVolume("single-cell-nfs-minio-pvc") # previously created volume
    
    # Passing pipeline parameter and a constant value as operation arguments
    first_add_task = pod_resource_transformer(add_op(a, 4), cpu_req="500m")
    # no value taken from cache
    first_add_task.execution_options.caching_strategy.max_cache_staleness = "P0D"
    first_add_task.set_display_name("add two number")
    # first_add_task.add_pvolumes({"/mnt": single_cell_volume})
    second_add_task = pod_resource_transformer(add_op(first_add_task.output, b), cpu_req="500m")
    # no cache 
    second_add_task.execution_options.caching_strategy.max_cache_staleness = "P0D"
    second_add_task.set_display_name("add two number again")
    # second_add_task.add_pvolumes({"/mnt": single_cell_volume})
    
    save_task = pod_resource_transformer(write_to_volume(second_add_task.output), cpu_req="500m")
    save_task.set_display_name("save to nfs volume")
    save_task.add_pvolumes({"/mnt": single_cell_volume})
    save_task.execution_options.caching_strategy.max_cache_staleness = "P0D"
    

### (optional step) Compile the pipeline to see the settings

In [17]:
PIPE_LINE_FILE_NAME="calc_pipeline_with_resource_limit"
kfp.compiler.Compiler().compile(calc_pipeline, f"{PIPE_LINE_FILE_NAME}.yaml")

# Run Pipeline with Multi-user Isolation

https://www.kubeflow.org/docs/components/pipelines/multi-user/

In [18]:
# get the pipeline host from env set up be the notebook instance
client = kfp.Client()

# Make sure the volume is mounted /run/secrets/kubeflow/pipelines 
# client.get_experiment(experiment_name=EXPERIMENT_NAME, namespace=NAME_SPACE)

In [19]:
# client.list_pipelines()

In [20]:
# print(NAME_SPACE)
# client.list_experiments(namespace=NAME_SPACE)
client.set_user_namespace(NAME_SPACE)
print(client.get_user_namespace())

kubeflow-kindfor


In [21]:
exp = client.create_experiment(EXPERIMENT_NAME, description=EXPERIMENT_DESC)

In [22]:
# Specify pipeline argument values
arguments = {'a': '7', 'b': '8'}

# added a default pod transformer to all the pipeline ops
pipeline_config: dsl.PipelineConf = dsl.PipelineConf()

#pipeline_config.add_op_transformer(
#    pod_defaults_transformer
#)

client.create_run_from_pipeline_func(pipeline_func=calc_pipeline, 
                                     arguments=arguments,
                                     experiment_name=EXPERIMENT_NAME, 
                                     namespace=NAME_SPACE,
                                     pipeline_conf=pipeline_config,
                                     mode = dsl.PipelineExecutionMode.V1_LEGACY)

# The generated links below lead to the Experiment page and the pipeline run details page, respectively

RunPipelineResult(run_id=3663ac74-7a3a-4cd5-982d-56cad7de5e63)