# Environment setup
To start things off, we'll set the s3 bucket name, download the training data to S3 and upload the customized training container to Elastic Container Registry (ECR).

If you don't have an S3 bucket to use, please go set one up now and note down the bucket name.

In [None]:
import sys
sys.path.append('/home/ec2-user/anaconda3/lib/python3.6/site-packages/')

import boto3
import re
from sagemaker import get_execution_role

assumed_role = get_execution_role()
print(str(assumed_role))

In [None]:
# Bucket location
temp_s3 = "s3://cnidus-ml-pdx/criteo/temp"

dataset_s3 = "s3://sagemaker-us-west-2-369233609183/datasets/criteo-16-tb/criteo_20180605_141913/"

In [None]:
!aws s3 ls s3://sagemaker-us-west-2-369233609183/datasets/criteo-16-tb/criteo_20180605_141913/ | wc -l

## Downloading the criteo dataset
We'll grab the dataset off the web and load it to S3.

The source is here:
http://labs.criteo.com/2013/12/download-terabyte-click-logs-2/

In [None]:
%%bash
mkdir /tmp/criteo
cd /tmp/criteo
curl -O http://azuremlsampleexperiments.blob.core.windows.net/criteo/day_0.gz

In [None]:
###Upload to S3
#TODO

## Download the script-mode container

In [None]:
%%bash
git clone -b mvs-script-mode-pipe-ps-server https://github.com/mvsusp/sagemaker-tensorflow-containers.git

Install dependancies (is this needed on a ssagemaker notebook?)

In [None]:
!pwd

In [None]:
import os
print(os.getcwd())
#os.chdir('sagemaker-tensorflow-containers/test/integration/benchmarks/criteo')
print(os.getcwd())
from sagemaker.tensorflow import TensorFlow

# Change this to your criteo small clicks or large clicks datasets:
# https://github.com/GoogleCloudPlatform/cloudml-samples/tree/master/criteo_tft#criteo-dataset
CRITEO_DATASET = dataset_s3

hyperparameters = {
    # sets the number of parameter servers in the cluster.
    'sagemaker_num_parameter_servers': 10,
    's3_channel':                      CRITEO_DATASET,
    'batch_size':                      30000,
    'dataset':                         'kaggle',
    'model_type':                      'linear',
    'l2_regularization':               100,

    # see https://www.tensorflow.org/performance/performance_guide#optimizing_for_cpu
    # Best value for this model is 10, default value in the container is 0.
    # 0 sets the value to the number of logical cores.
    'inter_op_parallelism_threads':    10,

    # environment variables that will be written to the container before training starts
    'sagemaker_env_vars':              {
        # True uses HTTPS, uses HTTP otherwise. Default false
        # see https://docs.aws.amazon.com/sdk-for-cpp/v1/developer-guide/client-config.html
        'S3_USE_HTTPS':  True,
        # True verifies SSL. Default false
        'S3_VERIFY_SSL': True,
        # Sets the time, in milliseconds, that a thread should wait, after completing the
        # execution of a parallel region, before sleeping. Default 0
        # see https://github.com/tensorflow/tensorflow/blob/faff6f2a60a01dba57cf3a3ab832279dbe174798/tensorflow/docs_src/performance/performance_guide.md#tuning-mkl-for-the-best-performance
        'KMP_BLOCKTIME': 25
    }
}

tf = TensorFlow(entry_point='task.py',
                source_dir='trainer',
                train_instance_count=10,
                train_instance_type='ml.c5.9xlarge',
                # pass in your own SageMaker role
                role=assumed_role,
                hyperparameters=hyperparameters)

# This points to the prototype images.
# Change the region (to us-west-2 or us-east-2) or TF version (to 1.7.0) if needed
tf.train_image = lambda: '520713654638.dkr.ecr.us-west-2.amazonaws.com/sagemaker-tensorflow:1.6.0-cpu-py2-script-mode-preview'

# publicly accessible placeholder data. Change the region if needed
tf.fit({'training': 's3://sagemaker-sample-data-us-west-2/spark/mnist/train'}, run_tensorboard_locally=True)