# Kubeflow Fairing Introduction

Kubeflow Fairing is a Python package that streamlines the process of `building`, `training`, and `deploying` machine learning (ML) models in a hybrid cloud environment. By using Kubeflow Fairing and adding a few lines of code, you can run your ML training job locally or in the cloud, directly from Python code or a Jupyter notebook. After your training job is complete, you can use Kubeflow Fairing to deploy your trained model as a prediction endpoint.


# How does Kubeflow Fairing work

Kubeflow Fairing 
1. Packages your Jupyter notebook, Python function, or Python file as a Docker image
2. Deploys and runs the training job on Kubeflow or AI Platform. 
3. Deploy your trained model as a prediction endpoint on Kubeflow after your training job is complete.


# Goals of Kubeflow Fairing project

- Easily package ML training jobs: Enable ML practitioners to easily package their ML model training code, and their code’s dependencies, as a Docker image.
- Easily train ML models in a hybrid cloud environment: Provide a high-level API for training ML models to make it easy to run training jobs in the cloud, without needing to understand the underlying infrastructure.
- Streamline the process of deploying a trained model: Make it easy for ML practitioners to deploy trained ML models to a hybrid cloud environment.

In [1]:
# Install latest Fairing from github repository
!git clone https://github.com/kubeflow/fairing
!pip install --upgrade fairing/

Cloning into 'fairing'...
remote: Enumerating objects: 10, done.[K
remote: Counting objects: 100% (10/10), done.[K
remote: Compressing objects: 100% (9/9), done.[K
remote: Total 3671 (delta 2), reused 1 (delta 1), pack-reused 3661[K
Receiving objects: 100% (3671/3671), 9.61 MiB | 0 bytes/s, done.
Resolving deltas: 100% (1604/1604), done.
Checking connectivity... done.
Processing ./fairing






Building wheels for collected packages: kubeflow-fairing
  Building wheel for kubeflow-fairing (setup.py) ... [?25ldone
[?25h  Stored in directory: /tmp/pip-ephem-wheel-cache-955khx69/wheels/98/1e/af/74029ca3fc6818382f1a3aa55f499bdab9e832420bf6a5f7c4
Successfully built kubeflow-fairing
Installing collected packages: kubeflow-fairing
  Found existing installation: kubeflow-fairing 0.7.1
    Uninstalling kubeflow-fairing-0.7.1:
      Successfully uninstalled kubeflow-fairing-0.7.1
Successfully installed kubeflow-fairing-0.7.1
[33mYou are using pip version 19.0.1, however version 20.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [2]:
# check fairing is installed 
!pip show kubeflow-fairing

Name: kubeflow-fairing
Version: 0.7.1
Summary: Kubeflow Fairing Python SDK.
Home-page: https://github.com/kubeflow/fairing
Author: Kubeflow Authors
Author-email: hejinchi@cn.ibm.com
License: Apache License Version 2.0
Location: /opt/conda/lib/python3.6/site-packages
Requires: numpy, google-cloud-logging, setuptools, future, google-api-python-client, python-dateutil, google-cloud-storage, google-auth, urllib3, six, cloudpickle, retrying, kubeflow-tfjob, kubernetes, oauth2client, httplib2, requests, notebook, boto3, docker, kubeflow-pytorchjob, tornado, kfserving, azure
Required-by: 


## Basic Example

If you see any issues, please restart notebook. It's probably because of new installed packages.

Click `Kernel` -> `Restart & Clear Output`

In [3]:
import os
import sys
from kubeflow import fairing
import tensorflow as tf
import numpy as np

def train():
    # Genrating random linear data 
    # There will be 50 data points ranging from 0 to 50 
    x = np.linspace(0, 50, 50) 
    y = np.linspace(0, 50, 50) 

    # Adding noise to the random linear data 
    x += np.random.uniform(-4, 4, 50) 
    y += np.random.uniform(-4, 4, 50) 

    n = len(x) # Number of data points 

    X = tf.placeholder("float") 
    Y = tf.placeholder("float")
    W = tf.Variable(np.random.randn(), name = "W") 
    b = tf.Variable(np.random.randn(), name = "b") 
    learning_rate = 0.01
    training_epochs = 1000
    
    # Hypothesis 
    y_pred = tf.add(tf.multiply(X, W), b) 

    # Mean Squared Error Cost Function 
    cost = tf.reduce_sum(tf.pow(y_pred-Y, 2)) / (2 * n)

    # Gradient Descent Optimizer 
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost) 

    # Global Variables Initializer 
    init = tf.global_variables_initializer() 


    sess = tf.Session()
    sess.run(init) 
      
    # Iterating through all the epochs 
    for epoch in range(training_epochs): 
          
        # Feeding each data point into the optimizer using Feed Dictionary 
        for (_x, _y) in zip(x, y): 
            sess.run(optimizer, feed_dict = {X : _x, Y : _y}) 
          
        # Displaying the result after every 50 epochs 
        if (epoch + 1) % 50 == 0: 
            # Calculating the cost a every epoch 
            c = sess.run(cost, feed_dict = {X : x, Y : y}) 
            print("Epoch", (epoch + 1), ": cost =", c, "W =", sess.run(W), "b =", sess.run(b)) 
      
    # Storing necessary values to be used outside the Session 
    training_cost = sess.run(cost, feed_dict ={X: x, Y: y}) 
    weight = sess.run(W) 
    bias = sess.run(b) 

    print('Weight: ', weight, 'Bias: ', bias)

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


## Local training for development



In [4]:
train()

[W 200125 20:46:18 deprecation:323] From /opt/conda/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
    Instructions for updating:
    Colocations handled automatically by placer.
[W 200125 20:46:18 deprecation:323] From /opt/conda/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use tf.cast instead.


Epoch 50 : cost = 5.432825 W = 1.0061042 b = -0.9009969
Epoch 100 : cost = 5.299083 W = 1.0005927 b = -0.62904495
Epoch 150 : cost = 5.2010303 W = 0.9956901 b = -0.3871324
Epoch 200 : cost = 5.1303577 W = 0.99132895 b = -0.17194141
Epoch 250 : cost = 5.0805893 W = 0.9874496 b = 0.019480195
Epoch 300 : cost = 5.0466795 W = 0.9839987 b = 0.18975767
Epoch 350 : cost = 5.0247154 W = 0.98092896 b = 0.34122685
Epoch 400 : cost = 5.0116663 W = 0.97819835 b = 0.4759648
Epoch 450 : cost = 5.0051928 W = 0.97576934 b = 0.5958195
Epoch 500 : cost = 5.0034966 W = 0.9736086 b = 0.7024358
Epoch 550 : cost = 5.0052023 W = 0.97168654 b = 0.79727525
Epoch 600 : cost = 5.009264 W = 0.96997684 b = 0.8816385
Epoch 650 : cost = 5.0148892 W = 0.9684559 b = 0.956684
Epoch 700 : cost = 5.021487 W = 0.9671031 b = 1.0234382
Epoch 750 : cost = 5.0286155 W = 0.96589965 b = 1.0828211
Epoch 800 : cost = 5.035954 W = 0.96482915 b = 1.135642
Epoch 850 : cost = 5.04327 W = 0.9638769 b = 1.1826284
Epoch 900 : cost = 5.0

## Remote training

We will show you how to remotely run training job in kubernetes cluster. You can use `ECR` as your container image registry.

In [5]:
# Authenticate ECR
# This command retrieves a token that is valid for a specified registry for 12 hours, 
# and then it prints a docker login command with that authorization token. 
# Then we executate this command to login ECR

REGION='us-west-2'
!eval $(aws ecr get-login --no-include-email --region=$REGION)

https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded


In [6]:
# Create an ECR repository in the same region
# If you receive "RepositoryAlreadyExistsException" error, it means the repository already
# exists. You can move to the next step
!aws ecr create-repository --repository-name fairing-job --region=$REGION

{
    "repository": {
        "repositoryArn": "arn:aws:ecr:us-west-2:123384328431:repository/fairing-job",
        "registryId": "123384328431",
        "repositoryName": "fairing-job",
        "repositoryUri": "123384328431.dkr.ecr.us-west-2.amazonaws.com/fairing-job",
        "createdAt": 1579985236.0,
        "imageTagMutability": "MUTABLE",
        "imageScanningConfiguration": {
            "scanOnPush": false
        }
    }
}


In [7]:
# Setting up AWS Elastic Container Registry (ECR) for storing output containers
# You can use any docker container registry instead of ECR
AWS_ACCOUNT_ID=fairing.cloud.aws.guess_account_id()
AWS_REGION='us-west-2'
DOCKER_REGISTRY = '{}.dkr.ecr.{}.amazonaws.com'.format(AWS_ACCOUNT_ID, AWS_REGION)

fairing.config.set_builder('append', base_image='tensorflow/tensorflow:1.14.0-py3', registry=DOCKER_REGISTRY, push=True)
fairing.config.set_deployer('job')
    
if __name__ == '__main__':
    remote_train = fairing.config.fn(train)
    remote_train()

[I 200125 20:47:23 config:125] Using preprocessor: <kubeflow.fairing.preprocessors.function.FunctionPreProcessor object at 0x7f45f79a0f28>
[I 200125 20:47:23 config:127] Using builder: <kubeflow.fairing.builders.append.append.AppendBuilder object at 0x7f45d477dcc0>
[I 200125 20:47:23 config:129] Using deployer: <kubeflow.fairing.deployers.job.job.Job object at 0x7f45d477dcf8>
[W 200125 20:47:23 append:50] Building image using Append builder...
[I 200125 20:47:23 base:107] Creating docker context: /tmp/fairing_context_g55f0j46
[W 200125 20:47:23 base:94] /opt/conda/lib/python3.6/site-packages/kubeflow/fairing/__init__.py already exists in Fairing context, skipping...
[I 200125 20:47:23 docker_creds_:234] Loading Docker credentials for repository 'tensorflow/tensorflow:1.14.0-py3'
[W 200125 20:47:24 append:54] Image successfully built in 1.0597115379996467s.
[W 200125 20:47:24 append:94] Pushing image 123384328431.dkr.ecr.us-west-2.amazonaws.com/fairing-job:CFBF035F...
[I 200125 20:47:24

W0125 20:48:02.501451 139805303498560 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_grad.py:1205: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
2020-01-25 20:48:02.525554: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-01-25 20:48:02.530133: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3000000000 Hz
2020-01-25 20:48:02.530433: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4c4ba20 executing computations on platform Host. Devices:
2020-01-25 20:48:02.530456: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
Epoch 50 : cost = 4.522482 W = 0.95778066 b = 1.7785

[W 200125 20:48:16 job:162] Cleaning up job fairing-job-tnrqb...
