# End to end experiment: Github Issue Summarization

Currently, this notebook must be run from the Kubeflow JupyterHub installation, as described in the codelab.

In this notebook, we will show how to:

* Interactively define a KubeFlow Pipeline using the Pipelines Python SDK
* Submit and run the pipeline
* Add a step in the pipeline

This example pipeline trains a [Tensor2Tensor](https://github.com/tensorflow/tensor2tensor/) model on Github issue data, learning to predict issue titles from issue bodies. It then exports the trained model and deploys the exported model to [Tensorflow Serving](https://github.com/tensorflow/serving). 
The final step in the pipeline launches a web app which interacts with the TF-Serving instance in order to get model predictions.

## Enviroinment Setup

Before any experiment can be conducted. We need to setup and initialize an environment: ensure all Python modules has been setup and configured, as well as python modules

Setting up python modules

In [None]:
!pip3 install --upgrade 'https://storage.googleapis.com/ml-pipeline/release/0.1.10/kfp.tar.gz' > /dev/null
!pip3 install --upgrade './extensions' > /dev/null
%load_ext extensions

import sys
sys.path.insert(0, 'src')

import kfp
import kfp.dsl as dsl
import kfp.gcp as gcp
import kfp.notebook

from ipython_secrets import get_secret
from kfp.compiler import Compiler

import extensions
import extensions.kaniko as kaniko
from os import environ

client = kfp.Client()

In [None]:
USER = environ['JUPYTERHUB_USER']
EXPERIMENT_NAME = f'Github issues {USER}'
DOCKER_REGISTRY = get_secret('DOCKER_REGISTRY')
DOCKER_REGISTRY_SECRET = get_secret('DOCKER_REGISTRY_SECRET')
DOCKER_TAG = 'latest'

AWS_S3_BUCKET = 'files.dev4.demo10.superhub.io'

DATA_FILE = '/home/jovyan/data/data-sample.csv'
try:
    exp = client.get_experiment(experiment_name=EXPERIMENT_NAME)
except:
    exp = client.create_experiment(EXPERIMENT_NAME)

In [None]:
%%template Dockerfile.keras
FROM tensorflow/tensorflow:latest-py3
COPY src /app
WORKDIR /app
RUN pip3 install --upgrade --no-cache-dir -r requirements.txt
ENTRYPOINT ['python3']

In [None]:
build_ctx=f"s3://{AWS_S3_BUCKET}/{USER}/{EXPERIMENT_NAME}/dockerbuild.tar.gz"
upload_build_context_to_s3(build_ctx)

def kaniko_op(name, destination, dockerfile,
              context=build_ctx, aws_secret=AWS_SECRET, 
              pull_secret=DOCKER_REGISTRY_SECRET):
    """ template function for kaniko build operation
    """
    return dsl.ContainerOp(
        name=name,
        image='gcr.io/kaniko-project/executor:latest',
        arguments=['--destination', destination,
                   '--dockerfile', dockerfile,
                   '--context', context]
    ).apply(
        use_aws_region_envvar()
    ).apply(
        kaniko.use_pull_secret_projection(pull_secret)
    )

@dsl.pipeline(
  name='Pipeline images',
  description='Build images that will be used by the pipeline'
)
def build_images():
    t2t = kaniko_op(
        name='keras',
        destination=f"{DOCKER_REGISTRY}/library/keras:{DOCKER_TAG}",
        dockerfile='Dockerfile.keras'
    )
    
Compiler().compile(build_images, 'kaniko.tar.gz')

In [None]:
client = kfp.Client()
try:
    exp = client.get_experiment(experiment_name=EXPERIMENT_NAME)
except:
    exp = client.create_experiment(EXPERIMENT_NAME)

run = client.run_pipeline(exp.id, 'Build images', 'kaniko.tar.gz')

In [None]:
# block till completion
client.wait_for_run_completion(run.id, timeout=720).run.status