# Building a Model Docker Image

Now it's time to extend the abstract image we just created for Lugwig algorithms and implement a Concrete Docker Image with our algorithms/models.

Here, we'll prepare a Docker image with an algorithm to classify our text

In [None]:
import os

base_repo = os.environ['BASE_REPO']
print('base repo: {}'.format(base_repo))

## First, lets create a Dockerfile

Inherit Dockerfile from `$BASE_REPO`

In [None]:
print('Writing Dockerfile')

with open('Dockerfile', 'w') as f:
    f.write('''FROM {}:latest

COPY model_definition.yml /opt/program
'''.format(base_repo))

## Then, let's the model_definition file

Define a model definition file 

In [None]:
%%writefile model_definition.yml
input_features:
    -
        name: text
        type: text
        level: word
        encoder: parallel_cnn

output_features:
    -
        name: class
        type: category

## Finally, let's create the buildspec
This file will be used by CodeBuild for creating our base image

In [None]:
%%writefile buildspec.yml
version: 0.2

phases:
  install:
    runtime-versions:
      docker: 18

  pre_build:
    commands:
      - echo Logging in to Amazon ECR...
      - $(aws ecr get-login --no-include-email --region $AWS_DEFAULT_REGION)
      - docker pull $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$BASE_REPO_NAME:$IMAGE_TAG
      - docker tag $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$BASE_REPO_NAME:$IMAGE_TAG $BASE_REPO_NAME:$IMAGE_TAG
  build:
    commands:
      - echo Build started on `date`
      - echo Building the Docker image...
      - docker build -t $IMAGE_REPO_NAME:$IMAGE_TAG .
      - docker tag $IMAGE_REPO_NAME:$IMAGE_TAG $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG

  post_build:
    commands:
      - echo Build completed on `date`
      - echo Pushing the Docker image...
      - echo docker push $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG
      - docker push $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG
      - echo $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG > image.url
      - cat image.url
      - echo Done
artifacts:
  files:
    - image.url
  name: image_url
  discard-paths: yes

### Building the image locally, first

In [None]:
!docker build -f Dockerfile -t $IMAGE_REPO:${IMAGE_TAG:-latest} .

# Let's do some tests, locally
## First, let's define some hyperparameters for both algorithms

In [None]:
# TODO: Lots apply these to the 'training' parameters section if required
hyperparameters = {
    "epochs": 100,
    "batch_size": 128,
}

In [None]:
import json
!mkdir -p input/config

hyperparameters = dict({key: str(values) for key, values in hyperparameters.items()})
with open('input/config/hyperparameters.json', 'w') as f:
    f.write(json.dumps(hyperparameters))

## Then, let's prepare a dataset

Download the input data which should contain `training`, `validation` and optional `test` paths

In [None]:
dataset_bucket = os.environ.get('DATASET_BUCKET')
dataset_prefix = os.environ.get('DATASET_PREFIX')

print('dataset: {}/{}'.format(dataset_bucket, dataset_prefix))

In [None]:
!rm -Rf input
!aws s3 sync  s3://$dataset_bucket/$dataset_prefix input/data

## Then, let's test the training process

Make model directory and clear any existing files

In [None]:
!mkdir -p model
!rm -Rf model/*

In [None]:
print( "Training ...")
!docker run --rm --name $IMAGE_REPO-train \
    -v "$PWD/model:/opt/ml/model" \
    -v "$PWD/input:/opt/ml/input"  $IMAGE_REPO:${IMAGE_TAG:-latest} train

## Now, a basic test with a direct call to our container

In [None]:
!rm -Rf output
!mkdir -p output/data

In [None]:
!ls input/data/validation

In [None]:
print( "Testing")
!docker run --rm --name $IMAGE_REPO-test \
    -v "$PWD/model:/opt/ml/model" \
    -v "$PWD/output:/opt/ml/output" \
    -v "$PWD/input:/opt/ml/input" $IMAGE_REPO:${IMAGE_TAG:-latest} test \
        '/opt/ml/input/data/validation' \
        '/opt/ml/output/data/predictions.csv' \
        '/opt/ml/output/data/test_stats.json' \
        --pandas_header=True # output the CSV head for predictions

Inspect the overall stats from validation dataset

In [None]:
import json

with open('output/data/test_stats.json', 'r') as f:
    test_stats = json.load(f)
    
test_stats['class']['overall_stats']

Inspect the output predictions which we will use for baselining

In [None]:
!wc -l output/data/predictions.csv
!tail -3 output/data/predictions.csv

## This is the serving test. It simulates an Endpoint exposed by Sagemaker

After you execute the next cell, this Jupyter notebook will freeze. A webservice will be exposed at the port 8080. 

In [None]:
!docker run --rm --name $IMAGE_REPO-serve \
    -p 8080:8080 \
    -v "$PWD/model:/opt/ml/model" \
    -v "$PWD/input:/opt/ml/input" $IMAGE_REPO:${IMAGE_TAG:-latest} serve

> While the above cell is running, click here [TEST NOTEBOOK](02_Testing%20our%20local%20model%20server.ipynb) to run some tests.

> After you finish the tests, press **STOP**

### Before we push our code to the repo, let's check the building process

In [None]:
import boto3

sts_client = boto3.client("sts")
session = boto3.session.Session()

account_id = sts_client.get_caller_identity()["Account"]
region = session.region_name
credentials = session.get_credentials()
credentials = credentials.get_frozen_credentials()

base_repo_name=os.environ['BASE_REPO']
repo_name=os.environ['IMAGE_REPO']
image_tag='test'

In [None]:
!mkdir -p tests
!cp model_definition.yml Dockerfile buildspec.yml tests/
with open('tests/vars.env', 'w') as f:
    f.write("AWS_ACCOUNT_ID=%s\n" % account_id)
    f.write("IMAGE_TAG=%s\n" % image_tag)
    f.write("BASE_REPO_NAME=%s\n" % base_repo_name)
    f.write("IMAGE_REPO_NAME=%s\n" % repo_name)
    f.write("AWS_DEFAULT_REGION=%s\n" % region)
    f.write("AWS_ACCESS_KEY_ID=%s\n" % credentials.access_key)
    f.write("AWS_SECRET_ACCESS_KEY=%s\n" % credentials.secret_key)
    f.write("AWS_SESSION_TOKEN=%s\n" % credentials.token )
    f.close()

!cat tests/vars.env

In [None]:
%%time

!/tmp/aws-codebuild/local_builds/codebuild_build.sh \
    -a "$PWD/tests/output" \
    -s "$PWD/tests" \
    -i "samirsouza/aws-codebuild-standard:2.0" \
    -e "$PWD/tests/vars.env" \
    -c

## Ok, now it's time to push everything to the repo

In [None]:
%%bash

cd ../../../mlops-workshop-images/$IMAGE_REPO
cp $OLDPWD/buildspec.yml $OLDPWD/model_definition.yml $OLDPWD/Dockerfile .

git add --all
git commit -a -m " - files for building $IMAGE_REPO image"
git push

### Ok, now open the AWS console in another tab and go to the CodePipeline console to see the status of our building pipeline