Skip to content

Nannakaroliina/kubeflow-pipeline-demo

Repository files navigation

Kubeflow pipeline demo

Kubeflow pipeline demo for MLOps purposes. This project contains components (see photo below) to preprocess data for ML model, train ML model and testing/using the model to predict. Each component is separate Docker instance that are build from the recent code using Github workflow action.

Preprocessing component takes a dataset directly from sklearn and splits the data to x and y, train and test parts. X_train and x_test are optimised for more accurate results. Data partitions are saved using numpy to /app directory. Train component loads the saved data through arguments given for the ContainerOp and uses it to train a model. Model is saved to the /app directory as model.plk. Testing components loads the saved data and model through the arguments and gives the prediction and score for the prediction.

Pipeline is build with kfp package which is Kubeflow Pipeline SDK API. Creating each individual op implementation of a container image with ContainerOp. Images are pulled from Docker for each component. For each op the needed arguments and file_outputs are defined on pipeline.py. Pipeline flow is defined in kubeflow_demo_pipeline and the pipeline.yaml file is compiled based on the pipeline.py code.

alt text

Code modifications

In case of code modifications to pipeline.py, run the code to generate new pipeline.yaml. If modifying the components, remember to push changes to Github to get newest version for the Docker images.

Requirements

  • Kubernetes
  • Kubeflow (kind, K3s, K3ai)

Installation steps for local Kubeflow Pipelines

There is three options: kind, K3s or K3ai. Following steps are for K3s, so for other options please visit: Kubeflow Documentation, Local Deployment

K3s

Uninstall Previous Versions of K3s

sh /usr/local/bin/k3s-uninstall.sh

Install K3s:

curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.21.8+k3s1 sh -

NOTE

The version after 1.21 does not support for some APIs of K3s in kubeflow. More information at link

Create a cluster:

sudo k3s server &

Check the cluster exists:

sudo k3s kubectl get node

Deploying Kubeflow Pipelines

Following installation works for all environments (kind, K3s, K3ai).

Deploy the Kubeflow Pipelines

export PIPELINE_VERSION=1.7.1

kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/cluster-scoped-resources?ref=$PIPELINE_VERSION"

kubectl wait --for condition=established --timeout=60s crd/applications.app.k8s.io

kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/env/platform-agnostic-pns?ref=$PIPELINE_VERSION"

Verify that the UI is accessible by port forwarding it:

kubectl port-forward -n kubeflow svc/ml-pipeline-ui 8080:80

You should see this when port-forward is successful.

alt text

Now you can access the UI at http:/localhost:8080/. In any case of issues check installation guide above or ask for further assistance.

Using the Kubeflow Pipelines UI

After opening the UI (see below), click the + Upload pipeline to create new pipeline.

alt text

Set name, description and upload pipeline.yaml file for the new pipeline and click create.

alt text

This will lead to summary view, click + Create run button from top of the window. Run details contains everything needed for basic run, click Start.

alt text

Finally the run view will open and the pipeline is running. After run the results will be available on the side panel.