# Bert Pipeline

## Prequisites

#### Following dependent packages needs to be installed before running the build script

1. kfp - `! pip install kfp`
2. docker - should be logged in using docker login
3. yq - install using snap - https://github.com/mikefarah/yq
4. jq - v1.6 - https://stedolan.github.io/jq/download/

#### ./build.sh takes two arguments

1. path to example - For ex: examples/cifar10
2. dockerhub username - For ex: shrinathsuresh


## Install Packages

In [2]:
!pip install captum torchvision matplotlib pillow flask flask_compress

**Make sure the current working directory is "pytorch_pipeline"**

In [73]:
import os
os.getcwd()

'/home/ubuntu/Repositories/fb/pytorch-pipeline/pytorch_pipeline'

## Generate Pipeline file

In [1]:
! ./build.sh examples/bert docker-username

## At the end of this step - pytorch_bert.yaml file will be generated

In [36]:
import kfp
import json
import os
from PIL import Image
from kfp import components
from kfp.components import load_component_from_file, load_component_from_url
from kfp import dsl
from kfp import compiler
from pathlib import Path

kfp.__version__

'1.6.0-rc.0'

# Enter your KFP URL and the token from the cookie
[Use this extension on chrome to get token]( https://chrome.google.com/webstore/detail/editthiscookie/fngmhnnpilhplaeedifhccceomclgfbg?hl=en)
![image.png](image.png)

## Set Pipeline URL, Cookie, Experiment and Namespac

In [49]:
# KFP_URL='istio-ingressgateway.istio-system.svc.cluster.local'
KFP_URL='http://localhost:8080'
COOKIE="Add Cookie here"
AUTH="authservice_session="+COOKIE
NAMESPACE="kubeflow-user-example-com"
EXPERIMENT="Default"

## Set Log bucket and Tensorboard Image

In [50]:
MINIO_ENDPOINT="http://minio-service.kubeflow:9000"
LOG_BUCKET="mlpipeline"
TENSORBOARD_IMAGE="jagadeeshj/tb_plugin:v1.8"

## Set Inference parameters

In [60]:
MODEL_NAME="bert"
DEPLOY_NAME="bertserve"
ISVC_NAME=DEPLOY_NAME+"."+NAMESPACE+"."+"example.com"
INFERENCE_URL="http://istio-ingressgateway.istio-system.svc.cluster.local"

## Create KFP Client and create an experiment

In [40]:
client = kfp.Client(host=KFP_URL+"/pipeline", cookies=AUTH)
client.create_experiment(name=EXPERIMENT, namespace=NAMESPACE)
experiments = client.list_experiments(namespace=NAMESPACE)
my_experiment = experiments.experiments[0]
my_experiment

{'created_at': datetime.datetime(2021, 4, 22, 8, 44, 39, tzinfo=tzutc()),
 'description': None,
 'id': 'aac96a63-616e-4d88-9334-6ca8df2bb956',
 'name': 'Default',
 'resource_references': [{'key': {'id': 'kubeflow-user-example-com',
                                  'type': 'NAMESPACE'},
                          'name': None,
                          'relationship': 'OWNER'}],
 'storage_state': 'STORAGESTATE_AVAILABLE'}

## Pipeline params

In [41]:
pipeline_params = {
    "minio_endpoint" : MINIO_ENDPOINT,
    "tf_image" : TENSORBOARD_IMAGE,
    "log_bucket" : LOG_BUCKET,
    "namespace" : NAMESPACE,
    "deploy" : DEPLOY_NAME
}

## Click on Run Details for navigating to pipeline

In [42]:
run_name = 'pytorch-bert'
# Execute pipeline
run = client.run_pipeline(my_experiment.id, run_name, "pytorch_bert.yaml", pipeline_params)

In [61]:
INFERENCE_SERVICE_LIST = ! kubectl get isvc $DEPLOY_NAME -n kubeflow-user-example-com -o json | jq .status.url | tr -d '"'| cut -d "/" -f 3
INFERENCE_SERVICE_NAME = INFERENCE_SERVICE_LIST[0]
INFERENCE_SERVICE_NAME

'bertserve.kubeflow-user-example-com.example.com'

In [3]:
!curl -v -H "Host: $INFERENCE_SERVICE_NAME" -H "Cookie: $AUTH" "$KFP_URL/v1/models/$MODEL_NAME:predict" -d @./examples/bert/sample.txt > ./bert_prediction_output.json

In [64]:
! cat ./bert_prediction_output.json

{"predictions": ["\"Sci/Tech\""]}

In [4]:
!curl -v -H "Host: $INFERENCE_SERVICE_NAME" -H "Cookie: $AUTH" "$KFP_URL/v1/models/$MODEL_NAME:explain" -d @./examples/bert/sample.txt  > bert_explaination_output.json

In [67]:
! cat bert_explaination_output.json

{"explanations": [{"words": ["[CLS]", "bloomberg", "has", "reported", "on", "the", "economy", "[SEP]"], "importances": [0.49803317807827413, -0.04228915625436579, -0.22691037181108395, 0.15573719339552444, 0.08677259891698845, 0.1791962203959244, 0.525546079847318, -0.5988261343532961], "delta": 0.12081549835977756}]}

In [68]:
import json
explanations_json = json.loads(open("./bert_explaination_output.json", "r").read())
explanations_json

{'explanations': [{'words': ['[CLS]',
    'bloomberg',
    'has',
    'reported',
    'on',
    'the',
    'economy',
    '[SEP]'],
   'importances': [0.49803317807827413,
    -0.04228915625436579,
    -0.22691037181108395,
    0.15573719339552444,
    0.08677259891698845,
    0.1791962203959244,
    0.525546079847318,
    -0.5988261343532961],
   'delta': 0.12081549835977756}]}

In [69]:
prediction_json = json.loads(open("./bert_prediction_output.json", "r").read())

In [70]:
import torch
attributions = explanations_json["explanations"][0]['importances']
tokens = explanations_json["explanations"][0]['words']
delta = explanations_json["explanations"][0]['delta']

attributions = torch.tensor(attributions)
pred_prob = 0.75
pred_class = prediction_json["predictions"][0]
true_class = "Business"
attr_class ="world"

In [71]:
from captum.attr import visualization
vis_data_records =[]
vis_data_records.append(visualization.VisualizationDataRecord(
                            attributions,
                            pred_prob,
                            pred_class,
                            true_class,
                            attr_class,
                            attributions.sum(),       
                            tokens,
                            delta))

In [72]:
visualization.visualize_text(vis_data_records)

True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
Business,"""Sci/Tech"" (0.75)",world,0.58,[CLS] bloomberg has reported on the economy [SEP]
,,,,


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
Business,"""Sci/Tech"" (0.75)",world,0.58,[CLS] bloomberg has reported on the economy [SEP]
,,,,


## Clean up
### Delete Viewers, Inference Services and Completed pods

In [7]:
! kubectl delete --all viewers -n $NAMESPACE

In [6]:
! kubectl delete --all isvc -n $NAMESPACE

In [5]:
! kubectl delete pod --field-selector=status.phase==Succeeded -n $NAMESPACE