Kubeflow pipeline demo for MLOps purposes. This project contains components (see photo below) to preprocess data for ML model, train ML model and testing/using the model to predict. Each component is separate Docker instance that are build from the recent code using Github workflow action.
Preprocessing component takes a dataset directly from sklearn and splits the data to x and y, train and test parts. X_train and x_test are optimised for more accurate results. Data partitions are saved using numpy to /app directory. Train component loads the saved data through arguments given for the ContainerOp and uses it to train a model. Model is saved to the /app directory as model.plk. Testing components loads the saved data and model through the arguments and gives the prediction and score for the prediction.
Pipeline is build with kfp package which is Kubeflow Pipeline SDK API. Creating each individual op implementation of a container image with ContainerOp. Images are pulled from Docker for each component. For each op the needed arguments and file_outputs are defined on pipeline.py. Pipeline flow is defined in kubeflow_demo_pipeline and the pipeline.yaml file is compiled based on the pipeline.py code.
In case of code modifications to pipeline.py, run the code to generate new pipeline.yaml. If modifying the components, remember to push changes to Github to get newest version for the Docker images.
- Kubernetes
- Kubeflow (kind, K3s, K3ai)
There is three options: kind, K3s or K3ai. Following steps are for K3s, so for other options please visit: Kubeflow Documentation, Local Deployment
Uninstall Previous Versions of K3s
sh /usr/local/bin/k3s-uninstall.sh
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.21.8+k3s1 sh -
The version after 1.21 does not support for some APIs of K3s in kubeflow. More information at link
sudo k3s server &
sudo k3s kubectl get node
Following installation works for all environments (kind, K3s, K3ai).
Deploy the Kubeflow Pipelines
export PIPELINE_VERSION=1.7.1
kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/cluster-scoped-resources?ref=$PIPELINE_VERSION"
kubectl wait --for condition=established --timeout=60s crd/applications.app.k8s.io
kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/env/platform-agnostic-pns?ref=$PIPELINE_VERSION"
Verify that the UI is accessible by port forwarding it:
kubectl port-forward -n kubeflow svc/ml-pipeline-ui 8080:80
You should see this when port-forward is successful.
Now you can access the UI at http:/localhost:8080/. In any case of issues check installation guide above or ask for further assistance.
After opening the UI (see below), click the + Upload pipeline to create new pipeline.
Set name, description and upload pipeline.yaml file for the new pipeline and click create.
This will lead to summary view, click + Create run button from top of the window. Run details contains everything needed for basic run, click Start.
Finally the run view will open and the pipeline is running. After run the results will be available on the side panel.