# Docker Image
We create a docker image to be used to run the `TrainTestClassify.py` script created by [the second notebook](01_Training_Script.ipynb) on the Batch AI cluster.

The steps are
- [import the libraries](#import),
- [create dotenv shared between notebooks](#dotenv)
- [create the dockerfile](#dockerfile),
- [create the Docker image](#create), and
- [test the Docker image](#test).

## Import the libraries  <a id='import'></a>

In [None]:
%load_ext dotenv
import os
from os import path
import json
import shutil
import dotenv

## Create and import the dotenv <a id='dotenv'></a>
Create a new `.env` file to contain names used in multiple notebooks.

In [None]:
dotenv_path = os.path.join('.', '.env')  # The location of the dotenv file
if os.path.isfile(dotenv_path):          # Remove any pre-existing dotenv file to ensure a blank slate
    os.remove(dotenv_path)
with open(dotenv_path, 'w'):             # Create an empty dotenv file
    None

Write in your docker login and image repository name.

In [None]:
dotenv.set_key(dotenv_path, 'docker_login', 'YOUR_DOCKER_LOGIN')
dotenv.set_key(dotenv_path, 'image_repo', '/mlbaiht')

In [None]:
dotenv.set_key(dotenv_path, 'docker_login', 'mabouatmicrosoft')
dotenv.set_key(dotenv_path, 'image_repo', '/mlbaiht')

Import the contents of the `.env` file into the environment

In [None]:
%dotenv -o

## Create the dockerfile <a id='dockerfile'></a>
Create the Docker directory.

In [None]:
!mkdir -p Docker

Add to the directory the requirements file that specifies the Python modules needed to run the training script.

In [None]:
%%writefile Docker/requirements.txt

lightgbm==2.1.2
pandas==0.23.4
scikit-learn==0.19.1


Add to the directory the dockerfile specifying the build.

In [None]:
%%writefile Docker/Dockerfile

# Start from a Python image
FROM python:3.5-stretch

# Copy into the image the definition of the requirements
COPY requirements.txt .

# Install the requirements
RUN python -m pip install -r requirements.txt


## Create the Docker image <a id='create'></a>

The name of the Docker image that we are creating.

In [None]:
image_name = os.getenv('docker_login') + os.getenv('image_repo')

Build the image. The first time this is run, this could take almost a minute.

In [None]:
%%time
print('Creating Docker image {}'.format(image_name))
!docker build -t $image_name Docker --no-cache

Push the image to the docker repo.

In [None]:
%%time
!sudo docker push $image_name

## Test the Docker image <a id='test'></a>
We can now test our image with our script locally. The `volume` argument maps the local directory that contains our data and script to `/data` in the container. Then, we call `bash` with a command string that calls Python with a path to the `TrainTestClassifier.py` script and script arguments including the path to the directory that contains the input files. The remaining script arguments are the same as those in the last cell of the [training script creation notebook](http://localhost:8888/notebooks/01_Training_Script.ipynb), and the results should be similar. 

This should take around five minutes.

In [None]:
%%time
!docker run --volume $(pwd):/data $image_name python /data/TrainTestClassifier.py --inputs /data --match 5 --estimators 1000 --ngrams 2 --min_child_samples 10

In [the next notebook](03_Configure_Batch_AI.ipynb), we create a file to contain the Batch AI configuration and some Azure resources we will use to create the Batch AI cluster.