# MNIST E2E on Kubeflow on Ncloud Kubernetes Service

이번 가이드에서는 다음 항목을 설명합니다.
  
  1. Tensorflow model으로 분산 학습(distributed training)을 수행합니다. 
  1. TFServing을 이용하여 모델을 Serving합니다.
  1. 배포 및 웹 UI를 통해 모델을 확인합니다.
  
## Requirements

  * Kubeflow가 설치된 Ncloud Kubernetes Service
 

## 노트북 설정

In [1]:
import notebook_setup
from importlib import reload
reload(notebook_setup)
notebook_setup.notebook_setup()

pip installing requirements.txt
Checkout kubeflow/tf-operator @9238906
Adding /home/jovyan/git_tf-operator/sdk/python to python path
Configure docker credentials


## Ncloud Object Storage 설정
* Conainer Registry, Model Serving을 위해 네이버 클라우드 플랫폼의 [Object Storage](https://console.ncloud.com/objectStorage)를 사용합니다.

### API 인증키 설정
* Object Storage에 접근을 위해서는 API 인증키가 필요합니다. API 인증키는 `포털 > 마이페이지 > 계정 관리 > 인증키 관리`([ncloud auth page](https://www.ncloud.com/mypage/manage/authkey))에서 확인할 수 있습니다.

In [2]:
import logging
import os
import uuid
import boto3

region_name = 'kr-standard'
endpoint_url = 'https://kr.object.ncloudstorage.com'
access_key = '<YOUR_ACCESS_KEY_ID>'
secret_key = '<YOUR_SECRET_KEY>'

boto3.client('s3', endpoint_url=endpoint_url, region_name=region_name
             , aws_access_key_id=access_key, aws_secret_access_key=secret_key).list_buckets()

{'ResponseMetadata': {'RequestId': 'a124943b-31cc-49f3-938c-78798d3d1342',
  'HostId': '',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'date': 'Mon, 12 Apr 2021 08:29:51 GMT',
   'x-clv-request-id': 'a124943b-31cc-49f3-938c-78798d3d1342',
   'x-clv-s3-version': '2.5',
   'accept-ranges': 'bytes',
   'x-amz-request-id': 'a124943b-31cc-49f3-938c-78798d3d1342',
   'content-type': 'application/xml',
   'content-length': '1936'},
  'RetryAttempts': 0},
 'Buckets': [{'Name': 'ai-api',
   'CreationDate': datetime.datetime(2019, 3, 12, 9, 51, 14, 803000, tzinfo=tzlocal())},
  {'Name': 'ai-beta-api',
   'CreationDate': datetime.datetime(2019, 4, 15, 6, 5, 53, 151000, tzinfo=tzlocal())},
  {'Name': 'ch1',
   'CreationDate': datetime.datetime(2020, 2, 12, 5, 51, 33, 403000, tzinfo=tzlocal())},
  {'Name': 'ch2',
   'CreationDate': datetime.datetime(2020, 2, 13, 1, 1, 8, 146000, tzinfo=tzlocal())},
  {'Name': 'cloud-hadoop-test',
   'CreationDate': datetime.datetime(2021, 3, 17, 9, 27, 45, 378000, t

## k8s 설정

In [3]:
import k8s_util
# Force a reload of kubeflow; since kubeflow is a multi namespace module
# it looks like doing this in notebook_setup may not be sufficient
import kubeflow
reload(kubeflow)
from kubernetes import client as k8s_client
from kubernetes import config as k8s_config
from kubeflow.tfjob.api import tf_job_client as tf_job_client_module
from IPython.core.display import display, HTML
import yaml

## Ncloud Container Registry 사용하기

* Kubeflow fairing의 builder에서 생성된 이미지를 저장하기 위해 Container Registry 접근 권한이 필요합니다.
* 자세한 설명은 [Ncloud Continaer Registry docs](https://docs.ncloud.com/ko/container/ncr-1-1.html)을 참고하시기 바랍니다.

## Container Registry 설정

* 빌드한 이미지를 **CONTAINER_REGISTRY**에 저장합니다. <br>
* 네이버 클라우드 플랫폼의 Container Registry를 생성 후 endpoint를 변수에 저장합니다.<br>
* Container Registry는 [여기](https://console.ncloud.com/ncr/registries)에서 생성할 수 있습니다.

### Docker credentials 설정

* [ncloud](https://www.ncloud.com/mypage/manage/authkey) ACCESSKEY와 SECRET KEY를 base64 인코딩 값 확인

`echo -n ACCESSKEY:SECRETKEY | base64`

* endpoint와 base64 인코딩 값을 이요하여 config.json 파일 생성
```json
{
	"auths": {
		"${REGISTRY_NAME}.kr.ncr.ntruss.com": {
			"auth": "xxxxxxxxxxxxxxx"
		}
	}
}
```


### Namespace 안에 kubernetes Comfigmap 생성

```
kubectl create --namespace ${NAMESPACE} configmap \
    docker-config --from-file=<path to config.json>
```
### Kubernetes에서 Container registry로 접근 허용 설정
```
kubectl -n ${NAMESPACE} create secret generic regcred \
    --from-file=.dockerconfigjson=<path to config.json> \
    --type=kubernetes.io/dockerconfigjson

kubectl -n ${NAMESPACE} patch serviceaccount default \
    -p '{"imagePullSecrets": [{"name": "regcred"}]}'
kubectl -n ${NAMESPACE} patch serviceaccount default-editor \
    -p '{"imagePullSecrets": [{"name": "regcred"}]}'
kubectl -n ${NAMESPACE} patch serviceaccount default-viewer \
    -p '{"imagePullSecrets": [{"name": "regcred"}]}'
```

In [4]:
from kubeflow import fairing   
from kubeflow.fairing import utils as fairing_utils

# You can use any docker container registry
CONTAINER_REGISTRY = 'mnist-e2e.kr.ncr.ntruss.com'

namespace = fairing_utils.get_current_k8s_namespace()

logging.info(f"Running in namespace {namespace}")
logging.info(f"Using ncloud container registry {CONTAINER_REGISTRY}")

Running in namespace cluster-notebook
Using ncloud container registry mnist-e2e.kr.ncr.ntruss.com


In [5]:
from kubeflow.fairing.builders import cluster
from kubeflow.fairing.deployers import job
from kubeflow.fairing.preprocessors import base as base_preprocessor

# output_map is a map of extra files to add to the notebook.
# It is a map from source location to the location inside the context.
output_map =  {
    "Dockerfile.model": "Dockerfile",
    "model.py": "model.py"
}

preprocessor = base_preprocessor.BasePreProcessor(
    command=["python"], # The base class will set this.
    input_files=[],
    path_prefix="/app", # irrelevant since we aren't preprocessing any files
    output_map=output_map)

preprocessor.preprocess()

set()

In [6]:
from kubeflow.fairing.cloud.k8s import MinioUploader
from kubeflow.fairing.builders.cluster.minio_context import MinioContextSource

region_name = 'kr-standard'
s3_endpoint = 'kr.object.ncloudstorage.com'
endpoint_url = 'https://'+s3_endpoint

s3_uploader = MinioUploader(
    endpoint_url=endpoint_url,
    minio_secret=access_key,
    minio_secret_key=secret_key,
    region_name=region_name
)

context_source = MinioContextSource(
    endpoint_url=endpoint_url,
    minio_secret=access_key,
    minio_secret_key=secret_key,
    region_name=region_name,
)

In [7]:
# Use a Tensorflow image as the base image
# We use a custom Dockerfile 
cluster_builder = cluster.cluster.ClusterBuilder(registry=CONTAINER_REGISTRY,
                                                 base_image="", # base_image is set in the Dockerfile
                                                 preprocessor=preprocessor,
                                                 image_name="mnist",
                                                 dockerfile_path="Dockerfile",
                                                 context_source=context_source)
cluster_builder.build()
logging.info(f"Built image {cluster_builder.image_tag}")

Building image using cluster builder.
Creating docker context: /tmp/fairing_context_twvrs5qf
Dockerfile already exists in Fairing context, skipping...
Waiting for fairing-builder-dpflt-rkqbd to start...
Waiting for fairing-builder-dpflt-rkqbd to start...
Waiting for fairing-builder-dpflt-rkqbd to start...
Pod started running True


[36mINFO[0m[0000] Retrieving image manifest tensorflow/tensorflow:1.15.2-py3
[36mINFO[0m[0003] Retrieving image manifest tensorflow/tensorflow:1.15.2-py3
[36mINFO[0m[0004] Built cross stage deps: map[]
[36mINFO[0m[0004] Retrieving image manifest tensorflow/tensorflow:1.15.2-py3
[36mINFO[0m[0005] Retrieving image manifest tensorflow/tensorflow:1.15.2-py3
[36mINFO[0m[0007] Executing 0 build triggers
[36mINFO[0m[0007] Unpacking rootfs as cmd ADD model.py /opt/model.py requires it.
[36mINFO[0m[0026] Taking snapshot of full filesystem...
[36mINFO[0m[0032] Resolving 27182 paths
[36mINFO[0m[0035] Using files from context: [/kaniko/buildcontext/model.py]
[36mINFO[0m[0035] ADD model.py /opt/model.py
[36mINFO[0m[0035] RUN chmod +x /opt/model.py
[36mINFO[0m[0035] cmd: /bin/sh
[36mINFO[0m[0035] args: [-c chmod +x /opt/model.py]
[36mINFO[0m[0035] Running: [/bin/sh -c chmod +x /opt/model.py]
[36mINFO[0m[0035] ENTRYPOINT ["/usr/local/bin/python"]
[36mINFO[0m[0035] CM

Built image mnist-e2e.kr.ncr.ntruss.com/mnist:F6ECFC99


## Object Storage 버킷 생성
* 모델과 결과를 저장하기 위해 네이버 클라우드 플랫폼의 [Object Storage](https://console.ncloud.com/objectStorage) 버킷을 생성합니다.
* 미리 생성해둔 버킷을 사용할 수도 있습니다.

In [8]:
bucket = f"{namespace}-mnist"
s3_uploader.create_bucket(bucket)
logging.info(f"Bucket {bucket} created or already exists")

Bucket cluster-notebook-mnist created or already exists


## Distributed Job 설정

In [9]:
train_name = f"mnist-train-{uuid.uuid4().hex[:4]}"
num_ps = 1
num_workers = 2
model_dir = f"s3://{bucket}/mnist"
export_path = f"s3://{bucket}/mnist/export"
train_steps = 200
batch_size = 100
learning_rate = .01
image = cluster_builder.image_tag

In [10]:
train_spec = f"""apiVersion: kubeflow.org/v1
kind: TFJob
metadata:
  name: {train_name}  
spec:
  tfReplicaSpecs:
    Ps:
      replicas: {num_ps}
      template:
        metadata:
          annotations:
            sidecar.istio.io/inject: "false"
        spec:
          serviceAccount: default-editor
          containers:
          - name: tensorflow
            command:
            - python
            - /opt/model.py
            - --tf-model-dir={model_dir}
            - --tf-export-dir={export_path}
            - --tf-train-steps={train_steps}
            - --tf-batch-size={batch_size}
            - --tf-learning-rate={learning_rate}
            env:
            - name: S3_ENDPOINT
              value: {s3_endpoint}
            - name: AWS_ENDPOINT_URL
              value: {endpoint_url}
            - name: AWS_REGION
              value: {region_name}
            - name: BUCKET_NAME
              value: {bucket}
            - name: S3_USE_HTTPS
              value: "0"
            - name: S3_VERIFY_SSL
              value: "0"
            - name: AWS_ACCESS_KEY_ID
              value: {access_key}
            - name: AWS_SECRET_ACCESS_KEY
              value: {secret_key}
            image: {image}
            workingDir: /opt
          restartPolicy: OnFailure
    Chief:
      replicas: 1
      template:
        metadata:
          annotations:
            sidecar.istio.io/inject: "false"
        spec:
          serviceAccount: default-editor
          containers:
          - name: tensorflow
            command:
            - python
            - /opt/model.py
            - --tf-model-dir={model_dir}
            - --tf-export-dir={export_path}
            - --tf-train-steps={train_steps}
            - --tf-batch-size={batch_size}
            - --tf-learning-rate={learning_rate}
            env:
            - name: S3_ENDPOINT
              value: {s3_endpoint}
            - name: AWS_ENDPOINT_URL
              value: {endpoint_url}
            - name: AWS_REGION
              value: {region_name}
            - name: BUCKET_NAME
              value: {bucket}
            - name: S3_USE_HTTPS
              value: "0"
            - name: S3_VERIFY_SSL
              value: "0"
            - name: AWS_ACCESS_KEY_ID
              value: {access_key}
            - name: AWS_SECRET_ACCESS_KEY
              value: {secret_key}
            image: {image}
            workingDir: /opt
          restartPolicy: OnFailure
    Worker:
      replicas: 1
      template:
        metadata:
          annotations:
            sidecar.istio.io/inject: "false"
        spec:
          serviceAccount: default-editor
          containers:
          - name: tensorflow
            command:
            - python
            - /opt/model.py
            - --tf-model-dir={model_dir}
            - --tf-export-dir={export_path}
            - --tf-train-steps={train_steps}
            - --tf-batch-size={batch_size}
            - --tf-learning-rate={learning_rate}
            env:
            - name: S3_ENDPOINT
              value: {s3_endpoint}
            - name: AWS_ENDPOINT_URL
              value: {endpoint_url}
            - name: AWS_REGION
              value: {region_name}
            - name: BUCKET_NAME
              value: {bucket}
            - name: S3_USE_HTTPS
              value: "0"
            - name: S3_VERIFY_SSL
              value: "0"
            - name: AWS_ACCESS_KEY_ID
              value: {access_key}
            - name: AWS_SECRET_ACCESS_KEY
              value: {secret_key}
            image: {image}
            workingDir: /opt
          restartPolicy: OnFailure
""" 


logging.info(f"{train_spec}")

apiVersion: kubeflow.org/v1
kind: TFJob
metadata:
  name: mnist-train-253c  
spec:
  tfReplicaSpecs:
    Ps:
      replicas: 1
      template:
        metadata:
          annotations:
            sidecar.istio.io/inject: "false"
        spec:
          serviceAccount: default-editor
          containers:
          - name: tensorflow
            command:
            - python
            - /opt/model.py
            - --tf-model-dir=s3://cluster-notebook-mnist/mnist
            - --tf-export-dir=s3://cluster-notebook-mnist/mnist/export
            - --tf-train-steps=200
            - --tf-batch-size=100
            - --tf-learning-rate=0.01
            env:
            - name: S3_ENDPOINT
              value: kr.object.ncloudstorage.com
            - name: AWS_ENDPOINT_URL
              value: https://kr.object.ncloudstorage.com
            - name: AWS_REGION
              value: kr-standard
            - name: BUCKET_NAME
              value: cluster-notebook-mnist
            - name: S3

### Training Job 생성

In [11]:
tf_job_client = tf_job_client_module.TFJobClient()
tf_job_body = yaml.safe_load(train_spec)
tf_job = tf_job_client.create(tf_job_body, namespace=namespace)  

logging.info(f"Created job {namespace}.{train_name}")

Created job cluster-notebook.mnist-train-253c


In [12]:
from kubeflow.tfjob import TFJobClient
tfjob_client = TFJobClient()
tfjob_client.wait_for_job(train_name, namespace=namespace, watch=True)

NAME                           STATE                TIME                          
mnist-train-253c               Running              2021-04-12T08:32:06Z          
mnist-train-253c               Succeeded            2021-04-12T08:32:20Z          


## TensorFlow log 확인

In [13]:
tfjob_client.get_logs(train_name, namespace=namespace)

The logs of Pod mnist-train-253c-chief-0:


W0412 08:32:07.618860 139922035803968 module_wrapper.py:139] From /opt/model.py:153: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W0412 08:32:07.619006 139922035803968 module_wrapper.py:139] From /opt/model.py:153: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W0412 08:32:07.619900 139922035803968 module_wrapper.py:139] From /opt/model.py:158: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.

INFO:tensorflow:TF_CONFIG {"cluster":{"chief":["mnist-train-253c-chief-0.cluster-notebook.svc:2222"],"ps":["mnist-train-253c-ps-0.cluster-notebook.svc:2222"],"worker":["mnist-train-253c-worker-0.cluster-notebook.svc:2222"]},"task":{"type":"chief","index":0},"environment":"cloud"}
I0412 08:32:07.619978 139922035803968 model.py:158] TF_CONFIG {"cluster":{"chief":["mnist-train-253c-chief-0.cluster-notebook.svc:2222"],"p

## Obejct Storage 모델 확인

In [14]:
#TODO(swiftdiaries): Check object key for model specifically
from botocore.exceptions import ClientError

try:
    model_response = s3_uploader.client.list_objects(Bucket=bucket)
    # Minimal check to see if at least the bucket is created
    if model_response["ResponseMetadata"]["HTTPStatusCode"] == 200:
        logging.info(f"{model_dir} found in {bucket} bucket")
except ClientError as err:
    logging.error(err)

s3://cluster-notebook-mnist/mnist found in cluster-notebook-mnist bucket


## Tensor Board 설정

In [15]:
tb_name = "mnist-tensorboard"

tb_deploy = f"""apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: mnist-tensorboard
  name: {tb_name}
  namespace: {namespace}
spec:
  selector:
    matchLabels:
      app: mnist-tensorboard
  template:
    metadata:
      labels:
        app: mnist-tensorboard
        version: v1
    spec:
      serviceAccount: default-editor
      containers:
      - command:
        - /usr/local/bin/tensorboard
        - --logdir={model_dir}
        - --port=80
        image: tensorflow/tensorflow:1.15.2-py3
        env:
        - name: S3_ENDPOINT
          value: {s3_endpoint}
        - name: AWS_ENDPOINT_URL
          value: {endpoint_url}
        - name: AWS_REGION
          value: {region_name}
        - name: BUCKET_NAME
          value: {bucket}
        - name: S3_USE_HTTPS
          value: "0"
        - name: S3_VERIFY_SSL
          value: "0"
        - name: AWS_ACCESS_KEY_ID
          value: {access_key}
        - name: AWS_SECRET_ACCESS_KEY
          value: {secret_key}  
        name: tensorboard
        ports:
        - containerPort: 80
"""
tb_service = f"""apiVersion: v1
kind: Service
metadata:
  labels:
    app: mnist-tensorboard
  name: {tb_name}
  namespace: {namespace}
spec:
  ports:
  - name: http-tb
    port: 80
    targetPort: 80
  selector:
    app: mnist-tensorboard
  type: ClusterIP
"""

tb_virtual_service = f"""apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: {tb_name}
  namespace: {namespace}
spec:
  gateways:
  - kubeflow/kubeflow-gateway
  hosts:
  - '*'
  http:
  - match:
    - uri:
        prefix: /mnist/{namespace}/tensorboard/
    rewrite:
      uri: /
    route:
    - destination:
        host: {tb_name}.{namespace}.svc.cluster.local
        port:
          number: 80
    timeout: 300s
"""

tb_specs = [tb_deploy, tb_service, tb_virtual_service]

In [16]:
k8s_util.apply_k8s_specs(tb_specs, k8s_util.K8S_CREATE_OR_REPLACE)

  spec = yaml.load(spec)
Deleted Deployment cluster-notebook.mnist-tensorboard
Created Deployment cluster-notebook.mnist-tensorboard
Deleted Service cluster-notebook.mnist-tensorboard
Created Service cluster-notebook.mnist-tensorboard
Deleted VirtualService cluster-notebook.mnist-tensorboard
Created VirtualService mnist-tensorboard.mnist-tensorboard


[{'api_version': 'apps/v1',
  'kind': 'Deployment',
  'metadata': {'annotations': None,
               'cluster_name': None,
               'creation_timestamp': datetime.datetime(2021, 4, 12, 8, 32, 38, tzinfo=tzlocal()),
               'deletion_grace_period_seconds': None,
               'deletion_timestamp': None,
               'finalizers': None,
               'generate_name': None,
               'generation': 1,
               'initializers': None,
               'labels': {'app': 'mnist-tensorboard'},
               'managed_fields': None,
               'name': 'mnist-tensorboard',
               'namespace': 'cluster-notebook',
               'owner_references': None,
               'resource_version': '1860397',
               'self_link': '/apis/apps/v1/namespaces/cluster-notebook/deployments/mnist-tensorboard',
               'uid': 'ffaf597f-f73f-487b-8b6d-1dab102dbcca'},
  'spec': {'min_ready_seconds': None,
           'paused': None,
           'progress_deadline_seco

## Tensorboard URL 확인
* URL을 확인하기 위해 RBAC 권한이 필요합니다.
* 해당 권한이 없을 경우 나타나는 로그 메세지의 명령을 실행하면 권한을 부여할 수 잇습니다.

In [17]:
from kubernetes.client.rest import ApiException
api_client = k8s_client.CoreV1Api()

istio_ingress_endpoint = None
try:
    istio_ingress_endpoint = api_client.read_namespaced_service(name='istio-ingressgateway', namespace='istio-system')
    istio_ports = istio_ingress_endpoint.spec.ports
    for istio_port in istio_ports:
        if istio_port.name == "http2":
            logging.warning("get ingress-host by running kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].hostname}'")
            logging.info(f"Tensorboard URL: <ingress-host>/mnist/{namespace}/tensorboard/")
except ApiException as e:
    if e.status == 403:
        logging.warning(f"The service account doesn't have sufficient privileges "
                      f"to get the kubeflow service. "
                      f"You will have to manually enter the cluster-ip. "
                      f"To make this function work ask someone with cluster "
                      f"priveleges to create an appropriate "
                      f"clusterrolebinding by running a command.\n"
                      f"kubectl create --namespace=istio-system rolebinding "
                       "--clusterrole=kubeflow-view "
                       "--serviceaccount=${NAMESPACE}:default-editor "
                       "${NAMESPACE}-ingressgateway-view")
        logging.warn("API Access restricted. Please get URL by running the kubectl commands at the end of the notebook")

get worker-node-ip by running 'kubectl get nodes -o wide'
Tensorboard URL: <ingress-host>:31380/mnist/cluster-notebook/tensorboard/


## Model Serving

In [18]:
deploy_name = "mnist-model"
model_base_path = export_path

# The web ui defaults to mnist-service so if you change it you will
# need to change it in the UI as well to send predictions to the mode
model_service = "mnist-service"

deploy_spec = f"""apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: mnist
  name: {deploy_name}
  namespace: {namespace}
spec:
  selector:
    matchLabels:
      app: mnist-model
  template:
    metadata:
      # TODO(jlewi): Right now we disable the istio side car because otherwise ISTIO rbac will prevent the
      # UI from sending RPCs to the server. We should create an appropriate ISTIO rbac authorization
      # policy to allow traffic from the UI to the model servier.
      # https://istio.io/docs/concepts/security/#target-selectors
      annotations:        
        sidecar.istio.io/inject: "false"
      labels:
        app: mnist-model
        version: v1
    spec:
      serviceAccount: default-editor
      containers:
      - args:
        - --port=9000
        - --rest_api_port=8500
        - --model_name=mnist
        - --model_base_path={model_base_path}
        command:
        - /usr/bin/tensorflow_model_server
        env:
        - name: modelBasePath
          value: {model_base_path}
        - name: S3_ENDPOINT
          value: {s3_endpoint}
        - name: AWS_ENDPOINT_URL
          value: {endpoint_url}
        - name: AWS_REGION
          value: {region_name}
        - name: BUCKET_NAME
          value: {bucket}
        - name: S3_USE_HTTPS
          value: "0"
        - name: S3_VERIFY_SSL
          value: "0"
        - name: AWS_ACCESS_KEY_ID
          value: {access_key}
        - name: AWS_SECRET_ACCESS_KEY
          value: {secret_key}  
        image: tensorflow/serving:1.15.0
        imagePullPolicy: IfNotPresent
        livenessProbe:
          initialDelaySeconds: 30
          periodSeconds: 30
          tcpSocket:
            port: 9000
        name: mnist
        ports:
        - containerPort: 9000
        - containerPort: 8500
        resources:
          limits:
            cpu: "4"
            memory: 4Gi
          requests:
            cpu: "1"
            memory: 1Gi
        volumeMounts:
        - mountPath: /var/config/
          name: model-config
      volumes:
      - configMap:
          name: {deploy_name}
        name: model-config
"""

service_spec = f"""apiVersion: v1
kind: Service
metadata:
  annotations:    
    prometheus.io/path: /monitoring/prometheus/metrics
    prometheus.io/port: "8500"
    prometheus.io/scrape: "true"
  labels:
    app: mnist-model
  name: {model_service}
  namespace: {namespace}
spec:
  ports:
  - name: grpc-tf-serving
    port: 9000
    targetPort: 9000
  - name: http-tf-serving
    port: 8500
    targetPort: 8500
  selector:
    app: mnist-model
  type: ClusterIP
"""

monitoring_config = f"""kind: ConfigMap
apiVersion: v1
metadata:
  name: {deploy_name}
  namespace: {namespace}
data:
  monitoring_config.txt: |-
    prometheus_config: {{
      enable: true,
      path: "/monitoring/prometheus/metrics"
    }}
"""

model_specs = [deploy_spec, service_spec, monitoring_config]

In [19]:
k8s_util.apply_k8s_specs(model_specs, k8s_util.K8S_CREATE_OR_REPLACE)    

Deleted Deployment cluster-notebook.mnist-model
Created Deployment cluster-notebook.mnist-model
Deleted Service cluster-notebook.mnist-service
Created Service cluster-notebook.mnist-service
Deleted ConfigMap cluster-notebook.mnist-model
Created ConfigMap cluster-notebook.mnist-model


[{'api_version': 'apps/v1',
  'kind': 'Deployment',
  'metadata': {'annotations': None,
               'cluster_name': None,
               'creation_timestamp': datetime.datetime(2021, 4, 12, 8, 32, 46, tzinfo=tzlocal()),
               'deletion_grace_period_seconds': None,
               'deletion_timestamp': None,
               'finalizers': None,
               'generate_name': None,
               'generation': 1,
               'initializers': None,
               'labels': {'app': 'mnist'},
               'managed_fields': None,
               'name': 'mnist-model',
               'namespace': 'cluster-notebook',
               'owner_references': None,
               'resource_version': '1860502',
               'self_link': '/apis/apps/v1/namespaces/cluster-notebook/deployments/mnist-model',
               'uid': 'b5487704-882a-446e-8da3-3be650432bec'},
  'spec': {'min_ready_seconds': None,
           'paused': None,
           'progress_deadline_seconds': 600,
           'r

##  Mnist UI 배포

* mnist 결과를 시각화하는 UI를 배포합니다.
* UI는 미리 빌드된 public docker image를 사용합니다.

In [20]:
ui_name = "mnist-ui"
ui_deploy = f"""apiVersion: apps/v1
kind: Deployment
metadata:
  name: {ui_name}
  namespace: {namespace}
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mnist-web-ui
  template:
    metadata:
      labels:
        app: mnist-web-ui
    spec:
      containers:
      - image: gcr.io/kubeflow-examples/mnist/web-ui:v20190112-v0.2-142-g3b38225
        name: web-ui
        ports:
        - containerPort: 5000        
      serviceAccount: default-editor
"""

ui_service = f"""apiVersion: v1
kind: Service
metadata:
  annotations:
  name: {ui_name}
  namespace: {namespace}
spec:
  ports:
  - name: http-mnist-ui
    port: 80
    targetPort: 5000
  selector:
    app: mnist-web-ui
  type: ClusterIP
"""

ui_virtual_service = f"""apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: {ui_name}
  namespace: {namespace}
spec:
  gateways:
  - kubeflow/kubeflow-gateway
  hosts:
  - '*'
  http:
  - match:
    - uri:
        prefix: /mnist/{namespace}/ui/
    rewrite:
      uri: /
    route:
    - destination:
        host: {ui_name}.{namespace}.svc.cluster.local
        port:
          number: 80
    timeout: 300s
"""

ui_specs = [ui_deploy, ui_service, ui_virtual_service]

In [21]:
k8s_util.apply_k8s_specs(ui_specs, k8s_util.K8S_CREATE_OR_REPLACE)

Deleted Deployment cluster-notebook.mnist-ui
Created Deployment cluster-notebook.mnist-ui
Deleted Service cluster-notebook.mnist-ui
Created Service cluster-notebook.mnist-ui
Deleted VirtualService cluster-notebook.mnist-ui
Created VirtualService mnist-ui.mnist-ui


[{'api_version': 'apps/v1',
  'kind': 'Deployment',
  'metadata': {'annotations': None,
               'cluster_name': None,
               'creation_timestamp': datetime.datetime(2021, 4, 12, 8, 32, 49, tzinfo=tzlocal()),
               'deletion_grace_period_seconds': None,
               'deletion_timestamp': None,
               'finalizers': None,
               'generate_name': None,
               'generation': 1,
               'initializers': None,
               'labels': None,
               'managed_fields': None,
               'name': 'mnist-ui',
               'namespace': 'cluster-notebook',
               'owner_references': None,
               'resource_version': '1860555',
               'self_link': '/apis/apps/v1/namespaces/cluster-notebook/deployments/mnist-ui',
               'uid': '9456cb86-909c-4fe1-883c-c7fc4e4016dc'},
  'spec': {'min_ready_seconds': None,
           'paused': None,
           'progress_deadline_seconds': 600,
           'replicas': 1,
     

## Web UI 접속
* URL을 확인하기 위해 RBAC 권한이 필요합니다.
* 해당 권한이 없을 경우 나타나는 로그 메세지의 명령을 실행하면 권한을 부여할 수 잇습니다.

In [22]:
istio_ingress_endpoint = None
try:
    istio_ingress_endpoint = api_client.read_namespaced_service(name='istio-ingressgateway', namespace='istio-system')
    istio_ports = istio_ingress_endpoint.spec.ports
    for istio_port in istio_ports:
        if istio_port.name == "http2":
            logging.warning("get ingress-host by running kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].hostname}'")
            logging.info(f"Tensorboard URL: <ingress-host>/{namespace}/anonymous/ui/")
except ApiException as e:
    if e.status == 403:
        logging.warning(f"The service account doesn't have sufficient privileges "
                      f"to get the kubeflow service. "
                      f"You will have to manually enter the cluster-ip. "
                      f"To make this function work ask someone with cluster "
                      f"priveleges to create an appropriate "
                      f"clusterrolebinding by running a command.\n"
                      f"kubectl create --namespace=kubeflow rolebinding "
                       "--clusterrole=kubeflow-view "
                       "--serviceaccount=${NAMESPACE}:default-editor "
                       "${NAMESPACE}-minio-view")
        logging.warn("API Access restricted. Please get URL by running the kubectl commands at the end of the notebook")

get worker-node-ip by running 'kubectl get nodes -o wide'
Tensorboard URL: <ingress-host>:31380/cluster-notebook/anonymous/ui/


## Tensorboard URL 확인

* URL을 확인하기 위해 RBAC 권한이 필요합니다.
* **Note:** `kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].hostname}'` 명령어를 실행하면 kubernetes loadbalancer url을 확인할 수 있습니다.
```bash
export LOADBALANCER_URL=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
printf "Tensorboard URL: \n${INGRESS_HOST}/mnist/{namespace}/tensorboard/\n"
```

## Web UI 접속

* URL을 확인하기 위해 RBAC 권한이 필요합니다.
* **Note:** `kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].hostname}'` 명령어를 실행하면 kubernetes loadbalancer url을 확인할 수 있습니다.
```bash
export LOADBALANCER_URL=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
printf "mnist-web-app URL: \n${INGRESS_HOST}/mnist/{namespace}/ui/\n"
```