# Building an Open Banking Docker Image

Now it's time to extend the abstract image we just created for Scikit Learn algorithms and implement a Concrete Docker Image with our algorithms/models.

Here, we'll prepare a Docker image with an algorithm to classify our text

We'll use a Sagemaker feature called "CustomAttributes" for preparing a dispatcher mechanism. The algorithm we want to use inside our container will be dispatched by this feature.

## First, lets create a Dockerfile

In [None]:
%%writefile Dockerfile
FROM ludwig-base:latest

COPY model_definition.yml /opt/program

## Then, let's the model_definition file

Define a model definition file 

In [None]:
%%writefile model_definition.yml
input_features:
    -
        name: text
        type: text
        level: word
        encoder: parallel_cnn

output_features:
    -
        name: class
        type: category

## Finally, let's create the buildspec
This file will be used by CodeBuild for creating our base image

In [None]:
%%writefile buildspec.yml
version: 0.2

phases:
  install:
    runtime-versions:
      docker: 18

  pre_build:
    commands:
      - echo Logging in to Amazon ECR...
      - $(aws ecr get-login --no-include-email --region $AWS_DEFAULT_REGION)
      - docker pull $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/scikit-base:latest
      - docker tag $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/scikit-base:latest scikit-base:latest
  build:
    commands:
      - echo Build started on `date`
      - echo Building the Docker image...
      - docker build -t $IMAGE_REPO_NAME:$IMAGE_TAG .
      - docker tag $IMAGE_REPO_NAME:$IMAGE_TAG $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG

  post_build:
    commands:
      - echo Build completed on `date`
      - echo Pushing the Docker image...
      - echo docker push $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG
      - docker push $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG
      - echo $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG > image.url
      - echo Done
artifacts:
  files:
    - image.url
  name: image_url
  discard-paths: yes

### Building the image locally, first

In [None]:
!docker build -f Dockerfile -t ludwig_openbanking:1.0 .

# Let's do some tests, locally
## First, let's define some hyperparameters for both algorithms

In [None]:
# TODO: Lots apply these to the 'training' parameters section if required
hyperparameters = {
    "epochs": 100,
    "batch_size": 128,
}

In [None]:
import json
!mkdir -p input/config

hyperparameters = dict({key: str(values) for key, values in hyperparameters.items()})
with open('input/config/hyperparameters.json', 'w') as f:
    f.write(json.dumps(hyperparameters))
    f.flush()
    f.close()

## Then, let's prepare a dataset


In [None]:
!rm -Rf input
!mkdir -p input/data/training input/data/test

!aws s3 cp s3://open-banking-classificaiton-ap-southeast-2/open-banking-test.csv input/data/training/train.csv
!aws s3 cp s3://open-banking-classificaiton-ap-southeast-2/open-banking-test.csv input/data/test/test.csv

## Then, let's test the training process

Make model directory and clear any existing files

In [None]:
!mkdir -p model
!rm -Rf model/*

In [None]:
print( "Training ...")
!docker run --rm --name 'ludwig_openbanking_train' \
    -v "$PWD/model:/opt/ml/model" \
    -v "$PWD/input:/opt/ml/input" ludwig_openbanking:1.0 train

In [None]:
!ls model

## Now, a basic test with a direct call to our container

In [None]:
!rm -Rf output
!mkdir -p output/data

In [None]:
print( "Testing")
!docker run --rm --name 'ludwig_openbanking_test' \
    -v "$PWD/model:/opt/ml/model" \
    -v "$PWD/output:/opt/ml/output" \
    -v "$PWD/input:/opt/ml/input" ludwig_openbanking:1.0 test \
        '/opt/ml/input/data/test/test.csv' \
        '/opt/ml/output/data/predictions.csv'

In [None]:
!wc -l output/data/predictions.csv

## This is the serving test. It simulates an Endpoint exposed by Sagemaker

After you execute the next cell, this Jupyter notebook will freeze. A webservice will be exposed at the port 8080. 

In [None]:
!docker run --rm --name 'ludwig_openbanking_serve' \
    -p 8080:8080 \
    -v "$PWD/model:/opt/ml/model" \
    -v "$PWD/input:/opt/ml/input" ludwig_openbanking:1.0 serve

> While the above cell is running, click here [TEST NOTEBOOK](02_Testing%20our%20local%20model%20server.ipynb) to run some tests.

> After you finish the tests, press **STOP**

### Before we push our code to the repo, let's check the building process

In [None]:
import boto3

sts_client = boto3.client("sts")
session = boto3.session.Session()

account_id = sts_client.get_caller_identity()["Account"]
region = session.region_name
credentials = session.get_credentials()
credentials = credentials.get_frozen_credentials()

repo_name='ludwig-openbanking'
image_tag='test'

In [None]:
!mkdir -p tests
!cp model_definition.yml Dockerfile buildspec.yml tests/
with open('tests/vars.env', 'w') as f:
    f.write("AWS_ACCOUNT_ID=%s\n" % account_id)
    f.write("IMAGE_TAG=%s\n" % image_tag)
    f.write("IMAGE_REPO_NAME=%s\n" % repo_name)
    f.write("AWS_DEFAULT_REGION=%s\n" % region)
    f.write("AWS_ACCESS_KEY_ID=%s\n" % credentials.access_key)
    f.write("AWS_SECRET_ACCESS_KEY=%s\n" % credentials.secret_key)
    f.write("AWS_SESSION_TOKEN=%s\n" % credentials.token )
    f.close()

!cat tests/vars.env

In [None]:
%%time

!/tmp/aws-codebuild/local_builds/codebuild_build.sh \
    -a "$PWD/tests/output" \
    -s "$PWD/tests" \
    -i "samirsouza/aws-codebuild-standard:2.0" \
    -e "$PWD/tests/vars.env" \
    -c

## Ok, now it's time to push everything to the repo

In [None]:
%%bash

cd ../../../mlops-workshop-images/iris_model
cp $OLDPWD/buildspec.yml $OLDPWD/model.py $OLDPWD/Dockerfile .

git add --all
git commit -a -m " - files for building an iris model image"
git push

### Ok, now open the AWS console in another tab and go to the CodePipeline console to see the status of our building pipeline