# Kubeflow Fairing Introduction

Kubeflow Fairing is a Python package that streamlines the process of `building`, `training`, and `deploying` machine learning (ML) models in a hybrid cloud environment. By using Kubeflow Fairing and adding a few lines of code, you can run your ML training job locally or in the cloud, directly from Python code or a Jupyter notebook. After your training job is complete, you can use Kubeflow Fairing to deploy your trained model as a prediction endpoint.


# How does Kubeflow Fairing work

Kubeflow Fairing 
1. Packages your Jupyter notebook, Python function, or Python file as a Docker image
2. Deploys and runs the training job on Kubeflow or AI Platform. 
3. Deploy your trained model as a prediction endpoint on Kubeflow after your training job is complete.


# Goals of Kubeflow Fairing project

- Easily package ML training jobs: Enable ML practitioners to easily package their ML model training code, and their code’s dependencies, as a Docker image.
- Easily train ML models in a hybrid cloud environment: Provide a high-level API for training ML models to make it easy to run training jobs in the cloud, without needing to understand the underlying infrastructure.
- Streamline the process of deploying a trained model: Make it easy for ML practitioners to deploy trained ML models to a hybrid cloud environment.

## Train and deploy model on Kubeflow in Notebooks

This examples comes from a upstream fairing [example](https://github.com/kubeflow/fairing/tree/master/examples/prediction).


Please check Kaggle competiton [
House Prices: Advanced Regression Techniques](https://www.kaggle.com/c/house-prices-advanced-regression-techniques)
for details about the ML problem we want to resolve.

This notebook introduces you to using Kubeflow Fairing to train and deploy a model to Kubeflow on Amazon EKS. This notebook demonstrate how to:

* Train an XGBoost model in a local notebook,
* Use Kubeflow Fairing to train an XGBoost model remotely on Kubeflow,
* Use Kubeflow Fairing to deploy a trained model to Kubeflow,
* Call the deployed endpoint for predictions.


### Install python dependencies

In [1]:
# Install latest Fairing from github repository
!pip install kubeflow-fairing==0.7.1







[33mYou are using pip version 19.0.1, however version 20.0.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [2]:
%%writefile requirements.txt
pandas
joblib
numpy
xgboost
scikit-learn>=0.21.0
seldon-core
tornado>=6.0.3

Overwriting requirements.txt


In [3]:
!pip install -r requirements.txt

Collecting seldon-core (from -r requirements.txt (line 6))
[?25l  Downloading https://files.pythonhosted.org/packages/60/ff/02dfad0ba1497637e8d746c7ad657bcebf174eecff0b3171c1af4cf13808/seldon_core-1.0.2-py3-none-any.whl (88kB)
[K    100% |████████████████████████████████| 92kB 6.3MB/s ta 0:00:011
Collecting redis<4.0.0 (from seldon-core->-r requirements.txt (line 6))
[?25l  Downloading https://files.pythonhosted.org/packages/f0/05/1fc7feedc19c123e7a95cfc9e7892eb6cdd2e5df4e9e8af6384349c1cc3d/redis-3.4.1-py2.py3-none-any.whl (71kB)
[K    100% |████████████████████████████████| 71kB 29.3MB/s ta 0:00:01
[?25hCollecting opentracing<2.3.0,>=2.2.0 (from seldon-core->-r requirements.txt (line 6))
[?25l  Downloading https://files.pythonhosted.org/packages/94/9f/289424136addf621fb4c75624ef9a3a80e8575da3993a87950c57e93217e/opentracing-2.2.0.tar.gz (47kB)
[K    100% |████████████████████████████████| 51kB 39.8MB/s ta 0:00:01
Collecting grpcio-opentracing<1.2.0,>=1.1.4 (from seldon-core->-r 

  Building wheel for opentracing (setup.py) ... [?25ldone
[?25h  Stored in directory: /root/.cache/pip/wheels/93/e9/b5/1cdc3544f99a54caca13832b5afa26fd98701fe709dc049576
  Building wheel for jaeger-client (setup.py) ... [?25ldone
[?25h  Stored in directory: /root/.cache/pip/wheels/f2/84/7f/e89da3ee8ce35598d6382b6389fa2ada5d66acca2422537994
  Building wheel for Flask-OpenTracing (setup.py) ... [?25ldone
[?25h  Stored in directory: /root/.cache/pip/wheels/7b/dc/25/3cf0b35c129232ee596c413f13d1d1f5a8e38c427266276dfd
  Building wheel for threadloop (setup.py) ... [?25ldone
[?25h  Stored in directory: /root/.cache/pip/wheels/d7/7a/30/d212623a4cd34f6cce400f8122b1b7af740d3440c68023d51f
  Building wheel for thrift (setup.py) ... [?25ldone
[?25h  Stored in directory: /root/.cache/pip/wheels/02/a2/46/689ccfcf40155c23edc7cdbd9de488611c8fdf49ff34b1706e
Successfully built opentracing jaeger-client Flask-OpenTracing threadloop thrift
[31mkfp 0.1 has requirement google-auth==1.6.1, but you'

In [None]:
# Restart the kernel to pick up pip installed libraries
from IPython.core.display import HTML
HTML("<script>Jupyter.notebook.kernel.restart()</script>")

## Train on Kubernetes

We will show you how to run a training job in kubernetes cluster. You can use `ECR` as your container image registry.

In [1]:
# Authenticate ECR
# This command retrieves a token that is valid for a specified registry for 12 hours, 
# and then it prints a docker login command with that authorization token. 
# Then we executate this command to login ECR

REGION='us-west-2'
!eval $(aws ecr get-login --no-include-email --region=$REGION)

https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded


In [2]:
# Create an ECR repository in the same region
!aws ecr describe-repositories --repository-names fairing-job --region=$REGION || aws ecr create-repository --repository-name fairing-job --region=$REGION

{
    "repositories": [
        {
            "repositoryArn": "arn:aws:ecr:us-west-2:173153674984:repository/fairing-job",
            "registryId": "173153674984",
            "repositoryName": "fairing-job",
            "repositoryUri": "173153674984.dkr.ecr.us-west-2.amazonaws.com/fairing-job",
            "createdAt": 1582400245.0,
            "imageTagMutability": "MUTABLE",
            "imageScanningConfiguration": {
                "scanOnPush": false
            }
        }
    ]
}


### Develop your model

In [3]:
import argparse
import logging
import joblib
import sys
import pandas as pd
import numpy as np
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split
from sklearn.impute import SimpleImputer
from xgboost import XGBRegressor

logging.basicConfig(format='%(message)s')
logging.getLogger().setLevel(logging.INFO)

In [4]:
def read_input(file_name, test_size=0.25):
    """Read input data and split it into train and test."""
    data = pd.read_csv(file_name)
    data.dropna(axis=0, subset=['SalePrice'], inplace=True)

    y = data.SalePrice
    X = data.drop(['SalePrice'], axis=1).select_dtypes(exclude=['object'])

    train_X, test_X, train_y, test_y = train_test_split(X.values,
                                                      y.values,
                                                      test_size=test_size,
                                                      shuffle=False)

    imputer = SimpleImputer()
    train_X = imputer.fit_transform(train_X)
    test_X = imputer.transform(test_X)

    return (train_X, train_y), (test_X, test_y)

def train_model(train_X,
                train_y,
                test_X,
                test_y,
                n_estimators,
                learning_rate):
    """Train the model using XGBRegressor."""
    model = XGBRegressor(n_estimators=n_estimators, learning_rate=learning_rate)

    model.fit(train_X,
            train_y,
            early_stopping_rounds=40,
            eval_set=[(test_X, test_y)])

    print("Best RMSE on eval: %.2f with %d rounds" %
               (model.best_score,
                model.best_iteration+1))
    return model

def eval_model(model, test_X, test_y):
    """Evaluate the model performance."""
    predictions = model.predict(test_X)
    logging.info("mean_absolute_error=%.2f", mean_absolute_error(predictions, test_y))

def save_model(model, model_file):
    """Save XGBoost model for serving."""
    joblib.dump(model, model_file)
    logging.info("Model export success: %s", model_file)
    
    
class HousingServe(object):
    
    def __init__(self):
        self.train_input = "ames_dataset/train.csv"
        self.n_estimators = 50
        self.learning_rate = 0.1
        self.model_file = "trained_ames_model.dat"
        self.model = None

    def train(self):
        (train_X, train_y), (test_X, test_y) = read_input(self.train_input)
        model = train_model(train_X,
                          train_y,
                          test_X,
                          test_y,
                          self.n_estimators,
                          self.learning_rate)

        eval_model(model, test_X, test_y)
        save_model(model, self.model_file)

    def predict(self, X, feature_names=None):
        """Predict using the model for given ndarray."""
        if not self.model:
            self.model = joblib.load(self.model_file)
        # Do any preprocessing
        prediction = self.model.predict(data=X)
        # Do any postprocessing
        return prediction

### Train an XGBoost model in a notebook

In [5]:
model = HousingServe()
model.train()

[0]	validation_0-rmse:177514
Will train until validation_0-rmse hasn't improved in 40 rounds.
[1]	validation_0-rmse:161858
[2]	validation_0-rmse:147237
[3]	validation_0-rmse:134132
[4]	validation_0-rmse:122224
[5]	validation_0-rmse:111538
[6]	validation_0-rmse:102142
[7]	validation_0-rmse:93392.3
[8]	validation_0-rmse:85824.6
[9]	validation_0-rmse:79667.6
[10]	validation_0-rmse:73463.4
[11]	validation_0-rmse:68059.4
[12]	validation_0-rmse:63350.5
[13]	validation_0-rmse:59732.1
[14]	validation_0-rmse:56260.7
[15]	validation_0-rmse:53392.6
[16]	validation_0-rmse:50770.8
[17]	validation_0-rmse:48107.8
[18]	validation_0-rmse:45923.9
[19]	validation_0-rmse:44154.2
[20]	validation_0-rmse:42488.1
[21]	validation_0-rmse:41263.3
[22]	validation_0-rmse:40212.8
[23]	validation_0-rmse:39089.1
[24]	validation_0-rmse:37691.1
[25]	validation_0-rmse:36875.2
[26]	validation_0-rmse:36276.2
[27]	validation_0-rmse:35444.1
[28]	validation_0-rmse:34831.5
[29]	validation_0-rmse:34205.4
[30]	validation_0-rmse

mean_absolute_error=18173.15
Model export success: trained_ames_model.dat


Best RMSE on eval: 28787.72 with 50 rounds


### Set up Kubeflow Fairing for training and predictions



In [6]:
from kubeflow import fairing
from kubeflow.fairing import TrainJob
from kubeflow.fairing.backends import KubeflowAWSBackend


from kubeflow import fairing

FAIRING_BACKEND = 'KubeflowAWSBackend'

AWS_ACCOUNT_ID = fairing.cloud.aws.guess_account_id()
AWS_REGION = 'us-west-2'
DOCKER_REGISTRY = '{}.dkr.ecr.{}.amazonaws.com'.format(AWS_ACCOUNT_ID, AWS_REGION)

S3_BUCKET = 'sagemaker-{}-{}'.format(AWS_REGION, AWS_ACCOUNT_ID)


In [7]:
import importlib

if FAIRING_BACKEND == 'KubeflowAWSBackend':
    from kubeflow.fairing.builders.cluster.s3_context import S3ContextSource
    BuildContext = S3ContextSource(
        aws_account=AWS_ACCOUNT_ID, region=AWS_REGION,
        bucket_name=S3_BUCKET
    )

BackendClass = getattr(importlib.import_module('kubeflow.fairing.backends'), FAIRING_BACKEND)

### Train an XGBoost model on Kubeflow
Import the `TrainJob` and use the configured backend class. Kubeflow Fairing packages the `HousingServe` class, the training data, and the training job's software prerequisites as a Docker image. Then Kubeflow Fairing deploys and runs the training job on Kubeflow.


In [8]:
from kubeflow.fairing import TrainJob
train_job = TrainJob(HousingServe, input_files=['ames_dataset/train.csv', "requirements.txt"],
                     docker_registry=DOCKER_REGISTRY,
                     backend=BackendClass(build_context_source=BuildContext))
train_job.submit()

Using default base docker image: registry.hub.docker.com/library/python:3.6.8
Using builder: <class 'kubeflow.fairing.builders.cluster.cluster.ClusterBuilder'>
Building the docker image.
Building image using cluster builder.
/opt/conda/lib/python3.6/site-packages/kubeflow/fairing/__init__.py already exists in Fairing context, skipping...
Creating docker context: /tmp/fairing_context__9b5lfbm
/opt/conda/lib/python3.6/site-packages/kubeflow/fairing/__init__.py already exists in Fairing context, skipping...
Waiting for fairing-builder-5qk7t-rqqps to start...
Waiting for fairing-builder-5qk7t-rqqps to start...
Waiting for fairing-builder-5qk7t-rqqps to start...
Pod started running True


[36mINFO[0m[0000] Resolved base name registry.hub.docker.com/library/python:3.6.8 to registry.hub.docker.com/library/python:3.6.8
[36mINFO[0m[0000] Resolved base name registry.hub.docker.com/library/python:3.6.8 to registry.hub.docker.com/library/python:3.6.8
[36mINFO[0m[0000] Downloading base image registry.hub.docker.com/library/python:3.6.8
[36mINFO[0m[0001] Error while retrieving image from cache: getting file info: stat /cache/sha256:e2b625c433e2e3c9a72eb92483c7e6ebe32163e320258f6a60badc44d9eb2806: no such file or directory
[36mINFO[0m[0001] Downloading base image registry.hub.docker.com/library/python:3.6.8
[36mINFO[0m[0002] Built cross stage deps: map[]
[36mINFO[0m[0002] Downloading base image registry.hub.docker.com/library/python:3.6.8
[36mINFO[0m[0002] Error while retrieving image from cache: getting file info: stat /cache/sha256:e2b625c433e2e3c9a72eb92483c7e6ebe32163e320258f6a60badc44d9eb2806: no such file or directory
[36mINFO[0m[0002] Downloading base ima

Collecting Jinja2>=2.10.1 (from Flask<2.0.0->seldon-core->-r requirements.txt (line 6))
  Downloading https://files.pythonhosted.org/packages/27/24/4f35961e5c669e96f6559760042a55b9bcfcdb82b9bdb3c8753dbe042e35/Jinja2-2.11.1-py2.py3-none-any.whl (126kB)
Collecting itsdangerous>=0.24 (from Flask<2.0.0->seldon-core->-r requirements.txt (line 6))
  Downloading https://files.pythonhosted.org/packages/76/ae/44b03b253d6fade317f32c24d100b3b35c2239807046a4c953c7b89fa49e/itsdangerous-1.1.0-py2.py3-none-any.whl
Collecting click>=5.1 (from Flask<2.0.0->seldon-core->-r requirements.txt (line 6))
  Downloading https://files.pythonhosted.org/packages/fa/37/45185cb5abbc30d7257104c434fe0b07e5a195a6847506c074527aa599ec/Click-7.0-py2.py3-none-any.whl (81kB)
Collecting Werkzeug>=0.15 (from Flask<2.0.0->seldon-core->-r requirements.txt (line 6))
  Downloading https://files.pythonhosted.org/packages/ba/a5/d6f8a6e71f15364d35678a4ec8a0186f980b3bd2545f40ad51dd26a87fb1/Werkzeug-1.0.0-py2.py3-none-any.whl (298kB)

[36mINFO[0m[0069] Using files from context: [/kaniko/buildcontext/app]
[36mINFO[0m[0069] COPY /app/ /app/
[36mINFO[0m[0069] Taking snapshot of files...


The job fairing-job-df5rd launched.
Waiting for fairing-job-df5rd-kcqgv to start...
Waiting for fairing-job-df5rd-kcqgv to start...
Waiting for fairing-job-df5rd-kcqgv to start...
Pod started running True


[0]	validation_0-rmse:177565.32812
Will train until validation_0-rmse hasn't improved in 40 rounds.
[1]	validation_0-rmse:161967.20312
[2]	validation_0-rmse:148001.90625
[3]	validation_0-rmse:135010.15625
[4]	validation_0-rmse:123514.68750
[5]	validation_0-rmse:113210.39062
[6]	validation_0-rmse:103914.61719
[7]	validation_0-rmse:95352.97656
[8]	validation_0-rmse:87878.78125
[9]	validation_0-rmse:81683.14062
[10]	validation_0-rmse:75828.78906
[11]	validation_0-rmse:70085.50781
[12]	validation_0-rmse:65076.06641
[13]	validation_0-rmse:60899.82031
[14]	validation_0-rmse:57354.23047
[15]	validation_0-rmse:54106.52734
[16]	validation_0-rmse:51402.43359
[17]	validation_0-rmse:48774.05078
[18]	validation_0-rmse:46360.19141
[19]	validation_0-rmse:44304.82812
[20]	validation_0-rmse:42618.65234
[21]	validation_0-rmse:41219.89062
[22]	validation_0-rmse:39885.14453
[23]	validation_0-rmse:38977.96484
[24]	validation_0-rmse:37856.48047
[25]	validation_0-rmse:36739.78125
[26]	validation_0-rmse:35847

Cleaning up job fairing-job-df5rd...


'fairing-job-df5rd'

### Deploy the trained model to Kubeflow for predictions

Import the `PredictionEndpoint` and use the configured backend class. Kubeflow Fairing packages the `HousingServe` class, the trained model, and the prediction endpoint's software prerequisites as a Docker image. Then Kubeflow Fairing deploys and runs the prediction endpoint on Kubeflow.

In [9]:
from kubeflow.fairing import PredictionEndpoint
endpoint = PredictionEndpoint(HousingServe, input_files=['trained_ames_model.dat', "requirements.txt"],
                              docker_registry=DOCKER_REGISTRY,
                              service_type='ClusterIP',
                              backend=BackendClass(build_context_source=BuildContext))
endpoint.create()

Using default base docker image: registry.hub.docker.com/library/python:3.6.8
Using builder: <class 'kubeflow.fairing.builders.cluster.cluster.ClusterBuilder'>
Building the docker image.
Building image using cluster builder.
/opt/conda/lib/python3.6/site-packages/kubeflow/fairing/__init__.py already exists in Fairing context, skipping...
Creating docker context: /tmp/fairing_context_oe0hrgmp
/opt/conda/lib/python3.6/site-packages/kubeflow/fairing/__init__.py already exists in Fairing context, skipping...
Waiting for fairing-builder-m8k8w-9rxbp to start...
Waiting for fairing-builder-m8k8w-9rxbp to start...
Waiting for fairing-builder-m8k8w-9rxbp to start...
Pod started running True


[36mINFO[0m[0000] Resolved base name registry.hub.docker.com/library/python:3.6.8 to registry.hub.docker.com/library/python:3.6.8
[36mINFO[0m[0000] Resolved base name registry.hub.docker.com/library/python:3.6.8 to registry.hub.docker.com/library/python:3.6.8
[36mINFO[0m[0000] Downloading base image registry.hub.docker.com/library/python:3.6.8
[36mINFO[0m[0001] Error while retrieving image from cache: getting file info: stat /cache/sha256:e2b625c433e2e3c9a72eb92483c7e6ebe32163e320258f6a60badc44d9eb2806: no such file or directory
[36mINFO[0m[0001] Downloading base image registry.hub.docker.com/library/python:3.6.8
[36mINFO[0m[0002] Built cross stage deps: map[]
[36mINFO[0m[0002] Downloading base image registry.hub.docker.com/library/python:3.6.8
[36mINFO[0m[0003] Error while retrieving image from cache: getting file info: stat /cache/sha256:e2b625c433e2e3c9a72eb92483c7e6ebe32163e320258f6a60badc44d9eb2806: no such file or directory
[36mINFO[0m[0003] Downloading base ima

Collecting thrift (from jaeger-client<4.2.0,>=4.1.0->seldon-core->-r requirements.txt (line 6))
  Downloading https://files.pythonhosted.org/packages/97/1e/3284d19d7be99305eda145b8aa46b0c33244e4a496ec66440dac19f8274d/thrift-0.13.0.tar.gz (59kB)
Collecting PyYAML (from pyaml<20.0.0->seldon-core->-r requirements.txt (line 6))
  Downloading https://files.pythonhosted.org/packages/3d/d9/ea9816aea31beeadccd03f1f8b625ecf8f645bd66744484d162d84803ce5/PyYAML-5.3.tar.gz (268kB)
Collecting click>=5.1 (from Flask<2.0.0->seldon-core->-r requirements.txt (line 6))
  Downloading https://files.pythonhosted.org/packages/fa/37/45185cb5abbc30d7257104c434fe0b07e5a195a6847506c074527aa599ec/Click-7.0-py2.py3-none-any.whl (81kB)
Collecting itsdangerous>=0.24 (from Flask<2.0.0->seldon-core->-r requirements.txt (line 6))
  Downloading https://files.pythonhosted.org/packages/76/ae/44b03b253d6fade317f32c24d100b3b35c2239807046a4c953c7b89fa49e/itsdangerous-1.1.0-py2.py3-none-any.whl
Collecting Werkzeug>=0.15 (from

[36mINFO[0m[0070] Using files from context: [/kaniko/buildcontext/app]
[36mINFO[0m[0070] COPY /app/ /app/
[36mINFO[0m[0070] Taking snapshot of files...


Deploying the endpoint.
Cluster endpoint: http://fairing-service-kntzt.eksworkshop.svc.cluster.local:5000/predict
Prediction endpoint: http://fairing-service-kntzt.eksworkshop.svc.cluster.local:5000/predict


### Configure the prediction endpoint

**Wait for the service to start and replace `<endpoint>` with the output from last step.**

In [10]:
# PR https://github.com/kubeflow/fairing/pull/376
# Add `:5000/predict` to mitigate the issue.

#################
endpoint.url='http://<endpoint>:5000/predict'
#################


### Create a test dataset

In [11]:
# Get sample data and query endpoint
(train_X, train_y), (test_X, test_y) = read_input("ames_dataset/train.csv")
test_X

array([[1096.        ,   20.        ,   78.        , ...,    0.        ,
           3.        , 2007.        ],
       [1097.        ,   70.        ,   60.        , ...,    0.        ,
           3.        , 2007.        ],
       [1098.        ,  120.        ,   69.62831858, ...,    0.        ,
          10.        , 2007.        ],
       ...,
       [1458.        ,   70.        ,   66.        , ..., 2500.        ,
           5.        , 2010.        ],
       [1459.        ,   20.        ,   68.        , ...,    0.        ,
           4.        , 2010.        ],
       [1460.        ,   20.        ,   75.        , ...,    0.        ,
           6.        , 2008.        ]])

### Call the endpoint on Kubeflow for predictions

In [12]:
endpoint.predict_nparray(test_X)

ConnectionError: HTTPConnectionPool(host='%3cendpoint%3e', port=5000): Max retries exceeded with url: /predict (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f5a94ea3e80>: Failed to establish a new connection: [Errno -2] Name or service not known',))

### Clean up the prediction endpoint
Delete the prediction endpoint created by this notebook.

In [None]:
endpoint.delete()