# Kubeflow Introduction

> __Kubeflow is an abstraction layer over `k8s` which enables easier `ML` oriented workflows__

In comparison to `k8s` it is:
- more Python centric
- uses other cluster-specific projects like [`istio`](https://github.com/istio/istio)
- leverages `Service`s out of the box and is all about micro-services defined with `Pipelines` (described later)

## What can I use it for?

- Experimentation and hyper-parameter tuning via [`katib`](https://www.kubeflow.org/docs/components/katib/) (way faster due to parallelization across cluster)
- Deploying and managing complex ML tasks at scale ([`kfserving`](https://www.kubeflow.org/docs/components/kfserving/))
- Monitoring our application (or ease of addition of respective components)
- `CI/CD` is not officially supported, although it can be quite easily mixed with popular solutions like `GH-Actions`, `Jenkins` or a-like



## Components

> `Kubeflow` provides various components for various common tasks durign typical `ML` lifecycle

- [Central Dashboard](https://www.kubeflow.org/docs/components/central-dash/) - visualize `pipeline`s, resource usage and other relevant statistics with easy to use `UI`
- [Notebook Servers](https://www.kubeflow.org/docs/components/notebooks/) - `jupyter notebook`s provided as a mean for experimentation, data visualization or exploration
- [Kubeflow Pipelines](https://www.kubeflow.org/docs/components/pipelines/) - high level DAG which allows us to specify necessary steps for deployment/training/tuning/CI etc.
- [KFServing](https://www.kubeflow.org/docs/components/kfserving/) - similar idea to `tensorflow`'s `Serving` 
- [Katib](https://www.kubeflow.org/docs/components/katib/) - hyperparameter tuning
- [Training Operators](https://www.kubeflow.org/docs/components/training/) - operators used to `train` specific frameworks (__mostly `DL` oriented__) to downstream common tasks
- [Multi-Tenancy](https://www.kubeflow.org/docs/components/multi-tenancy/) - IAM and isolation, security related and sharing resources across teams

# Dependencies

Before `Kubeflow` itself we should know what other dependencies (except `kubernetes`) are required (or possible).

> Only brief explanations of each component is presented; for more information refer to their respective documentation

In this cell, __required dependencies__ are presented. Next cell, shows an example of __external dependency__ one could use.

## istio

> __Only necessary `3rd` party addition__

It is automatically included with `kubeflow` during installation.

Following `steps` outline the need for `istio`:
1. Applications are built using distributed microservices
2. Each microservice is an instance of `Service` with it's own `API` and a way to communicate with it
3. __`service mesh` - network of such services working together__
4. __It is getting progressively harder to manage such stack__

This is where `istio` comes in, which, after being added allows us to:
- Add observability for our services (e.g. gathering metrics)
- Provides `load-balancing`, failure recovery and metrics __via simple `.yaml` configuration__
- Secures and validates services taking part in such mesh (e.g. `TLS` encryption)
- Controls traffic between services via `proxies`

Services before `istio`:

![](./images/istio-before.svg)

Services after `istio` mesh was applied:

![](./images/istio-after.svg)

### `istio` in `kubeflow`

- Used for applying appropriate policies
- Adding identity of a `user` for other services we communicate with

Creating new `Notebook Server` can be seen in the image below:

![](./images/istio-in-kubeflow.svg)

## Elyra

> set of AI-centric extensions to JupyterLab Notebooks.

From workflow enhancement for Data Scientists, e.g.:
- `LSP` (Language Server Protocol) integration for `jupyterlab` (__renaming, finding by reference and other features known from `IDE`s__)
- Notebooks navigation via `TOC`
- [Code snippets](https://elyra.readthedocs.io/en/latest/getting_started/overview.html#reusable-code-snippets)

to more advanced features, namely:
- [Visual Pipeline Editor](https://elyra.readthedocs.io/en/latest/getting_started/overview.html#ai-pipelines-visual-editor)- __connect your `notebooks` and `python` scripts to define a workflow__
- One can execute these pipelines locally. via `kubeflow pipelines` (hey there!) or `apache airflow`
- Running `jupyter notebook`s as batch jobs

`Visual` pipelines and related features are leveraged by `kubeflow` to provide better dev/deploy/visualization experience

![](./images/elyra-notebook-pipeline.png)

> __See [FOSS repository](https://github.com/elyra-ai/elyra) and link for more information__

# Kubeflow Architecture Overview

You might feel overwhelmed looking at this image at first, __but don't worry, we will explain what is going in more and more detail as we will proceed__

![](./images/kubeflow-pipelines-architecture.png)

We already know a few things, namely:
- `k8s` specific part of Orchestration System
- Controllers (driving provided `.yaml` settings from __current__ to __desired__ state)
- `k8s` `Node`s which do the "heavy-lifting" our application requies

Other elements of this image, briefly (going from top to bottom):
- Python SDK - user defined `pipeline` using `dsl`. __This `dsl` is transpiled to `k8s` readable `.yaml` files__
- __Pipeline webserver__ - gathers data from `Service`s with help of [`istio`](https://istio.io/)'s service mesh
- __Pipeline `Service`__ - service used to watch transpiled `config` and `apply` it
- __Pipeline `Persistent Agent`__ - watched `k8s` resources created by `Pipeline Service`:
    - __records `containers` that executed (and their `inputs` and `outputs`)__
    - Above can be either:
        - `container` parameters
        - `URI`s of data artifacts
- __Orchestration controllers__ - control state and deploy `workload resources` accordingly to defined DAG:
    - `Argo` is the core controller (see [here](https://github.com/argoproj/argo-workflows/)); __allows us to schedule complicated pipelines with dependencies__
- __Artifact Storage__ - store relevant data:
    - `metadata` - stored in a `mysql` database used as `PersistentVolume`
    - `artifacts` - [`minio`](https://min.io/) or cloud storage (__used for fast `w/r` access__) used as `PersistentVolume`

# Challenges

## Mandatory

### Components

- __Check how to create custom `Component` WITHOUT `python function`__ (e.g. as a standalone program with custom `Dockerfile` and `yaml` specification) [here](https://www.kubeflow.org/docs/components/pipelines/sdk/component-development/). This approach allows us for:
    - Easier customization of `Dockerfile` (although one can almost always deploy `image` beforehand and use it from within `python`)
    - Custom `.yaml` file - this one is harder to obtain with `python` functions (e.g. dependent values on `envvars` etc.)
    - __Make sure you have read and understood this part!__
- Check what is required to use `recursion` with `kubeflow`'s `dsl` [here](https://www.kubeflow.org/docs/components/pipelines/sdk/dsl-recursion/)

    
### Integrations

- Check out [`sidecar injection`](https://istio.io/latest/blog/2019/data-plane-setup/) pattern (used by `istio`). What is it, why is it useful?
- Check out [`kubeflow-kale`](https://github.com/kubeflow-kale/kale) as an even simpler way to define `kubeflow` pipelines.

### DSL

- Go through [`kfp.dsl`](https://kubeflow-pipelines.readthedocs.io/en/latest/source/kfp.dsl.html) documentation. What others methods could you use to the define the `pipeline`? Check out [parallel loop](https://kubeflow-pipelines.readthedocs.io/en/latest/source/kfp.dsl.html#kfp.dsl.ParallelFor)

### Kubernetes

- Check how to directly manipulate `k8s` objects [here](https://www.kubeflow.org/docs/components/pipelines/sdk/manipulate-resources/). __It should not be done in general__ and `k8s` objects are better provided more "statically", but it is worth knowing one can do this from `kubeflow`.

### Metrics

- Check how to create `metrics` as a part of `pipeline` or as a `component` [here](https://www.kubeflow.org/docs/components/pipelines/sdk/pipelines-metrics/). Which option (`pipeline` or `component`) should be preferred? 


## Additional

### Components

- Check out [static type checking](https://www.kubeflow.org/docs/components/pipelines/sdk/static-type-checking/). How could one utilize it for increased robustness of the pipeline?

### Visualization

- Check visualization within `Kubeflow UI` [here](https://www.kubeflow.org/docs/components/pipelines/sdk/output-viewer/). __You might want to use different way for `visualization`s though as `kubeflow`'s target is a little different__

### Integrations

- Check out [`min.io`](https://min.io/) for better understanding of the cloud-native and `k8s` first data foramt
- Read about basics of [`argoproj`](https://github.com/argoproj/argo-workflows/) which is used by `kubeflow` to orchestrate workflows
- `feast` feature store can be used together with `kubeflow`. Check [relevant documentation](https://www.kubeflow.org/docs/external-add-ons/feature-store/overview/) __and use it after `feast` was presented by `AiCore` team!__
- Check out integrated [tools for serving](https://www.kubeflow.org/docs/external-add-ons/serving/) (includes `NVidia`'s `Triton` and `bento` project)
- Check out [`kubeflow fairing`](https://www.kubeflow.org/docs/external-add-ons/fairing/fairing-overview/) for improved hybrid cloud experience with `ML` (also allows to run/debug our runs locally). In addition, one can easily open an `endpoint` with deployed `model`, hence __worth checking out!__