# VI - Tensorboard Visualization
This notebook will show how to create an SSH tunnel from the machine running the Notebook to the compute node of a task that is running or has run a task that has generated [Tensorboard](https://www.tensorflow.org/get_started/summaries_and_tensorboard) summary compatible output.

**NOTE:** This notebook cannot be run on Azure notebooks due to restrictions. Please run this notebook locally. If you are running this notebook on Windows, please ensure you have `ssh.exe` in your `%PATH%`. You can download OpenSSH binaries for Windows [here](https://github.com/PowerShell/Win32-OpenSSH/releases).

* [Setup](#section1)
* [Configure job](#section2)
* [Submit job](#section3)
* [Delete job](#section4)

<a id='section1'></a>

## Setup

Create a simple alias for Batch Shipyard

In [1]:
%alias shipyard SHIPYARD_CONFIGDIR=config python $HOME/batch-shipyard/shipyard.py %l

Check that everything is working

In [2]:
shipyard --version

shipyard.py, version 2.8.0


Get some variables stored in the Setup notebook:

In [3]:
import json

def read_json(filename):
    with open(filename, 'r') as infile:
        return json.load(infile)
    
account_info = read_json('account_information.json')

IMAGE_NAME = account_info['IMAGE_NAME']
STORAGE_ALIAS = account_info['STORAGE_ALIAS']

<a id='section2'></a>
## Configure job
The following will be similar to the [Single GPU Training](02_Single_GPU_Training.ipynb) notebook from earlier.

For the `jobs` configuration, we will add `--logdir=tensorboard_logs` as a parameter to generate the Tensorboard summary log data during the run.

In [4]:
TASK_ID = 'run_cifar10' # This should be changed per task

JOB_ID = 'cntk-train-tensorboard-job'

COMMAND = 'bash -c "source /cntk/activate-cntk; python -u ConvNet_CIFAR10.py --datadir $AZ_BATCH_NODE_SHARED_DIR/data --tensorboard_logdir tensorboard_logs"'

jobs = {
    "job_specifications": [
        {
            "id": JOB_ID,
            "tasks": [
                {
                    "id": TASK_ID,
                    "image": IMAGE_NAME,
                    "remove_container_after_exit": True,
                    "command": COMMAND,
                    "gpu": True,
                    "resource_files": [
                        {
                            "file_path": "ConvNet_CIFAR10.py",
                            "blob_source": "https://batchshipyardexamples.blob.core.windows.net/code/ConvNet_CIFAR10.py",
                            "file_mode":'0777'
                        }
                    ],
                    "output_data": {
                        "azure_storage": [
                            {
                                "storage_account_settings": STORAGE_ALIAS,
                                "container": "output",
                                "source": "$AZ_BATCH_TASK_WORKING_DIR/Models"
                            },
                        ]
                    },
                }
            ],
        }
    ]
}

Write the jobs configuration to the `jobs.json` file:

In [5]:
import json
import os

def write_json_to_file(json_dict, filename):
    """ Simple function to write JSON dictionaries to files
    """
    with open(filename, 'w') as outfile:
        json.dump(json_dict, outfile)

write_json_to_file(jobs, os.path.join('config', 'jobs.json'))
print(json.dumps(jobs, indent=4, sort_keys=True))

{
    "job_specifications": [
        {
            "id": "cntk-train-tensorboard-job", 
            "tasks": [
                {
                    "command": "bash -c \"source /cntk/activate-cntk; python -u ConvNet_CIFAR10.py --datadir $AZ_BATCH_NODE_SHARED_DIR/data --tensorboard_logdir tensorboard_logs\"", 
                    "gpu": true, 
                    "id": "run_cifar10", 
                    "image": "microsoft/cntk:2.0-gpu-python3.5-cuda8.0-cudnn5.1", 
                    "output_data": {
                        "azure_storage": [
                            {
                                "container": "output", 
                                "source": "$AZ_BATCH_TASK_WORKING_DIR/Models", 
                                "storage_account_settings": "mystorageaccount"
                            }
                        ]
                    }, 
                    "remove_container_after_exit": true, 
                    "resource_files": [
                        {

<a id='section3'></a>

## Submit job
Check that everything is ok with our pool before we submit our jobs


In [6]:
shipyard pool listnodes

2017-07-12 14:45:45,080 DEBUG - listing nodes for pool gpupool
2017-07-12 14:45:45,403 INFO - node_id=tvm-1392786932_1-20170712t132611z [state=ComputeNodeState.idle start_task_exit_code=0 scheduling_state=SchedulingState.enabled ip_address=10.0.0.6 vm_size=standard_nc6 dedicated=True total_tasks_run=2 running_tasks_count=0 total_tasks_succeeded=2]
2017-07-12 14:45:45,403 INFO - node_id=tvm-1392786932_2-20170712t132611z [state=ComputeNodeState.idle start_task_exit_code=0 scheduling_state=SchedulingState.enabled ip_address=10.0.0.5 vm_size=standard_nc6 dedicated=True total_tasks_run=3 running_tasks_count=0 total_tasks_succeeded=3]
2017-07-12 14:45:45,403 INFO - node_id=tvm-1392786932_3-20170712t132611z [state=ComputeNodeState.idle start_task_exit_code=0 scheduling_state=SchedulingState.enabled ip_address=10.0.0.4 vm_size=standard_nc6 dedicated=True total_tasks_run=5 running_tasks_count=0 total_tasks_succeeded=5]


Now that we have confirmed everything is working we can execute our job using the command below. Note that we'll not be using the `--tail` option so that the command completes and we can tunnel to Tensorboard concurrently as the task is executing.

In [7]:
shipyard jobs add

2017-07-12 14:45:49,216 INFO - Adding job cntk-train-tensorboard-job to pool gpupool
2017-07-12 14:45:49,784 INFO - uploading file /tmp/tmp5o3cih as u'shipyardtaskrf-cntk-train-tensorboard-job/run_cifar10.shipyard.envlist'
2017-07-12 14:45:50,016 DEBUG - submitting 1 tasks (0 -> 0) to job cntk-train-tensorboard-job
2017-07-12 14:45:50,308 INFO - submitted all 1 tasks to job cntk-train-tensorboard-job


Run the Batch Shipyard command to instantiate a Tensorboard instance and create an SSH tunnel. The following cell should not return immediately if it is working. Browse to the Tensorboard URL output by the command (which will not be output in the notebook since it is a blocking call), which is http://localhost:6006/

**Notes:**
1. The Tensorboard instance may take some time to start since this pool does not have the TensorFlow Docker image pre-loaded.
2. You will need to manually interrupt the kernel once you are done with your Tensorboard visualization.

In [8]:
shipyard jobs listtasks

2017-07-12 14:45:57,169 INFO - job_id=cntk-train-tensorboard-job task_id=run_cifar10 [state=TaskState.running max_retries=0 retention_time=10675199 days, 2:48:05.477581 pool_id=gpupool node_id=tvm-1392786932_3-20170712t132611z start_time=2017-07-12 14:45:50.862843+00:00 end_time=None duration=n/a exit_code=None]


In [9]:
shipyard misc tensorboard --jobid $JOB_ID --taskid $TASK_ID -y

2017-07-12 14:46:01,347 DEBUG - waiting for task run_cifar10 in job cntk-train-tensorboard-job to reach a valid state
2017-07-12 14:46:01,557 DEBUG - using auto-detected logdir: tensorboard_logs
2017-07-12 14:46:01,557 DEBUG - using logpath: /mnt/batch/tasks/workitems/cntk-train-tensorboard-job/job-1/run_cifar10/wd/tensorboard_logs
2017-07-12 14:46:01,749 INFO - 

>> Please connect to Tensorboard at http://localhost:6006/

>> Note that Tensorboard may take a while to start if the Docker is
>> not present. Please keep retrying the URL every few seconds.

>> Terminate your session with CTRL+C

>> If you cannot terminate your session cleanly, run:
     shipyard pool ssh --nodeid tvm-1392786932_3-20170712t132611z sudo docker kill e32f11ac

Connection to 52.168.26.170 closed.
2017-07-12 14:46:08,978 DEBUG - attempting clean up of Tensorboard instance and SSH tunnel



<a id='section4'></a>

## Delete job

To delete the job use the command below. Just be aware that this will get rid of all the files created by the job and tasks.

In [10]:
shipyard jobs del -y --termtasks --wait

2017-07-12 14:48:52,609 INFO - Deleting job: cntk-train-tensorboard-job
2017-07-12 14:48:52,609 DEBUG - disabling job cntk-train-tensorboard-job first due to task termination
2017-07-12 14:48:53,527 INFO - Terminating task: run_cifar10
cntk-train-tensorboard-job-run_cifar10
Connection to 52.168.26.170 closed.
2017-07-12 14:48:54,976 DEBUG - waiting for task run_cifar10 in job cntk-train-tensorboard-job to terminate
2017-07-12 14:48:55,333 DEBUG - waiting for job cntk-train-tensorboard-job to delete
2017-07-12 14:49:27,329 INFO - job cntk-train-tensorboard-job does not exist


In [11]:
shipyard pool del -y --wait

2017-07-12 14:50:20,221 INFO - Deleting pool: gpupool
2017-07-12 14:50:20,503 DEBUG - clearing table (pk=batch0e43a94eba$gpupool): shipyardregistry
2017-07-12 14:50:20,803 DEBUG - clearing table (pk=batch0e43a94eba$gpupool): shipyardgr
2017-07-12 14:50:20,912 DEBUG - clearing table (pk=batch0e43a94eba$gpupool): shipyardperf
2017-07-12 14:50:20,962 DEBUG - clearing table (pk=batch0e43a94eba$gpupool): shipyarddht
2017-07-12 14:50:21,013 DEBUG - clearing table (pk=batch0e43a94eba$gpupool): shipyardimages
2017-07-12 14:50:21,115 DEBUG - clearing table (pk=batch0e43a94eba$gpupool): shipyardtorrentinfo
2017-07-12 14:50:21,166 DEBUG - deleting queue: shipyardgr-batch0e43a94eba-gpupool
2017-07-12 14:50:21,420 DEBUG - deleting container: shipyardtor-batch0e43a94eba-gpupool
2017-07-12 14:50:21,637 DEBUG - deleting container: shipyardrf-batch0e43a94eba-gpupool
2017-07-12 14:50:21,684 DEBUG - waiting for pool gpupool to delete
