# Kubeflow 

## Introduction

> __Kubeflow is an abstraction layer over `k8s` that simplifies the deployment of `ML`-oriented workflows.__

### Kubeflow vs `k8s`

Compared to `k8s`, kubeflow
- is more Python centric.
- uses other cluster-specific projects, such as [`istio`](https://github.com/istio/istio).
- leverages `Service`s out of the box and is centred on micro-services defined with `Pipelines`.

## Applications of Kubeflow
Kubeflow has several uses, including the following:
- Experimentation and hyper-parameter tuning via [`katib`](https://www.kubeflow.org/docs/components/katib/) (this occurs rather rapidly, attributed to the parallelisation across clusters).
- Deployment and management of complex ML tasks at scale ([`kfserving`](https://www.kubeflow.org/docs/components/kfserving/)).
- Application monitoring (enabling the easy addition of respective components).
- Although it does not officially support `CI/CD`, it can be quite easily combined with popular solutions, including `GH-Actions` and `Jenkins`.

## Components

> `Kubeflow` provides various components for various common tasks during a typical `ML` lifecycle.

- [Central Dashboard](https://www.kubeflow.org/docs/components/central-dash/): employed to visualise `pipeline`s, resource usage and other relevant statistics with an intuitive `UI`.
- [Notebook Servers](https://www.kubeflow.org/docs/components/notebooks/): `jupyter notebook`s, provided as a means for experimentation, data visualisation and exploration.
- [Kubeflow Pipelines](https://www.kubeflow.org/docs/components/pipelines/): high-level DAGs that enable the specification of the necessary steps for deployment/training/tuning/CI, etc.
- [KFServing](https://www.kubeflow.org/docs/components/kfserving/): similar to `tensorflow`'s `Serving`. 
- [Katib](https://www.kubeflow.org/docs/components/katib/): for hyperparameter tuning.
- [Training Operators](https://www.kubeflow.org/docs/components/training/): operators used to `train` specific frameworks (__mostly `DL` oriented__) to downstream common tasks.
- [Multi-Tenancy](https://www.kubeflow.org/docs/components/multi-tenancy/): IAM specification and isolation, security, and resource sharing across teams.

## Dependencies

Before delving into `Kubeflow`, it is important to know what other dependencies (except `kubernetes`) are required.

Here, we provide brief explanations of each component. For more information, refer to the respective documentation.


### istio

> This is __the only required `3rd` party addition.__

It is automatically included with `kubeflow` during installation and is required for the following reasons:

- Applications are built using distributed microservices.
- Each microservice is an instance of a `Service` with a custom `API` and a communication channel.
- The difficulty associated with securing and validating services in the __`service mesh`, which is a network of services working together.__
- __The increasing difficulty associated with managing such stacks.__

#### Benefits of istio
After installation, `istio` allows us to do the following:
- Add observability for services (e.g. gathering metrics).
- Enable `load-balancing`, failure recovery and metrics __via a simple `.yaml` configuration.__
- Secure and validate the services in a `service mesh` (e.g. `TLS` encryption).
- Control traffic between services via `proxies`.

The schemes below illustrate the flow of Services before and after the application of the `istio` mesh:

*Before*<br><br>
![](./images/istio-before.svg)

*After*<br><br>

![](./images/istio-after.svg)

#### Uses of `istio` in `kubeflow`

- Employed to apply appropriate policies.
- Utilised to add the identity of a `user` for other services that we communicate with.

The flowchart below showcases the components/processes involved in creating a new `Notebook Server`.

![](./images/istio-in-kubeflow.svg)

### Elyra

Here, we present an example of an __external dependency.__

> Elyra consists of a set of AI-centric extensions to JupyterLab Notebooks.

#### Benefits
 
1. Workflow enhancement for data scientists.
- `LSP` (Language Server Protocol) integration for `jupyterlab` (renaming, finding based on reference, and other features from `IDE`s).
- Notebook navigation via `TOC`.
- [Code snippets](https://elyra.readthedocs.io/en/latest/getting_started/overview.html#reusable-code-snippets).

2. It offers advanced features.
- [Visual Pipeline Editor](https://elyra.readthedocs.io/en/latest/getting_started/overview.html#ai-pipelines-visual-editor): this enables you to __connect your `notebooks` and `python` scripts to define a workflow.__
- These pipelines can be executed locally via `kubeflow pipelines` or `apache airflow`.
- `Jupyter notebook`s can be run as batch jobs.

`Visual` pipelines and related features are leveraged by `kubeflow` to enhance the dev/deploy/visualisation experience.

![](./images/elyra-notebook-pipeline.png)

> __For more information, see the [FOSS repository](https://github.com/elyra-ai/elyra).__

## Kubeflow Architecture Overview

The image below presents an overview of the Kubeflow architecture. We acknowledge that this might seem overwhelming. However, rest assured that the various elements in the image will be explored in detail as we proceed.

![](./images/kubeflow-pipelines-architecture.png)

### Familiar elements in the image

- The `k8s` specific part of the orchestration system.
- Controllers (driving the provided `.yaml` settings from the __current__ to the __desired__ state).
- `k8s` `Node`s, which handle the laborious requirements of applications.

### New elements in the image

A brief explanation of the other elements in the image is provided here:

- __Python SDK:__ a user-defined `pipeline` using `dsl`, which is __transpiled to `k8s` readable `.yaml` files.__
- __Pipeline webserver:__ gathers data from `Service`s via [`istio`](https://istio.io/)'s service mesh.
- __Pipeline `Service`:__ used to observe the transpiled `config` and `apply` it.
- __Pipeline `Persistent Agent`:__ (watches `k8s` resources created by the `Pipeline Service`)
    - __records `containers` that were executed, as well as their `inputs` and `outputs`__ (either `container` parameters or the `URI`s of data artifacts).
- __Orchestration controllers__: control the state and deploy `workload resources` accordingly to defined DAG:
    - `Argo` is the core controller (see [here](https://github.com/argoproj/argo-workflows/) for more information) that __allows the scheduling of complicated pipelines with dependencies.__
- __Artifact Storage:__ for storing relevant data, including
    - `metadata` (stored in a `mysql` database used as the `PersistentVolume`).
    - `artifacts`([`minio`](https://min.io/) or cloud storage (__used for fast `w/r` access__) used as the `PersistentVolume`).

## Further Reading

### Components

- Read up on [static type checking](https://www.kubeflow.org/docs/components/pipelines/sdk/static-type-checking/) and how to utilise it to increase the robustness of pipelines.

### Visualisation

- Explore visualisation within the `Kubeflow UI` [here](https://www.kubeflow.org/docs/components/pipelines/sdk/output-viewer/). __Consider using a different approach for `visualization`s as `kubeflow`'s target is slightly different.__

### Integrations

- Check out [`min.io`](https://min.io/) to improve your understanding of the cloud-native and `k8s` first data format.
- Read about basics of [`argoproj`](https://github.com/argoproj/argo-workflows/), which is used by `kubeflow` to orchestrate workflows.
- Find out if the `Feast` feature store can be used with `kubeflow`. Consult the [relevant documentation](https://www.kubeflow.org/docs/external-add-ons/feature-store/overview/), __and use it after `feast` has been presented by the `AiCore` team.__
- Read about integrated [tools for serving](https://www.kubeflow.org/docs/external-add-ons/serving/), including `NVidia`'s `Triton` and `bento` projects.
- Read about [`kubeflow fairing`](https://www.kubeflow.org/docs/external-add-ons/fairing/fairing-overview/) for improved hybrid cloud experience with `ML` and for executing/debugging runs locally. Additionally, with this, it is easy to open an `endpoint` with the deployed `model`.

## Conclusion
At this point, you should have a good understanding of

- Kubeflow and its advantages over `k8s`.
- the applications of Kubeflow.
- Kubeflow components and dependencies.
- the Kubeflow architecture.