# TensorFlow Script Mode with Pipe Mode Input


SageMaker Pipe Mode is an input mechanism for SageMaker training containers based on Linux named pipes. SageMaker makes the data available to the training container using named pipes, which allows data to be downloaded from S3 to the container while training is running. For larger datasets, this dramatically improves the time to start training, as the data does not need to be first downloaded to the container. To learn more about pipe mode, please consult the AWS documentation at: https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo.html#your-algorithms-training-algo-running-container-trainingdata.

In this tutorial, we show you how to train a TensorFlow estimator using data read with SageMaker Pipe Mode. We use the SageMaker PipeModeDataset class - a special TensorFlow Dataset built specifically to read from SageMaker Pipe Mode data. This Dataset is available in our TensorFlow containers for TensorFlow versions 1.7.0 and up. It's also open-sourced at https://github.com/aws/sagemaker-tensorflow-extensions and can be built into custom TensorFlow images for use in SageMaker.

Although you can also build the PipeModeDataset into your own containers, in this tutorial we'll show how you can use the PipeModeDataset by launching training from the SageMaker Python SDK. The SageMaker Python SDK helps you deploy your models for training and hosting in optimized, production-ready containers in SageMaker. The SageMaker Python SDK is easy to use, modular, extensible and compatible with TensorFlow and many other deep learning frameworks.

Different collections of S3 files can be made available to the training container while it's running. These are referred to as "channels" in SageMaker. In this example, we use two channels - one for training data and one for evaluation data. Each channel is mapped to S3 files from different directories. The SageMaker PipeModeDataset knows how to read from the named pipes for each channel given just the channel name. When we launch SageMaker training we tell SageMaker what channels we have and where in S3 to read the data for each channel.


## Setup
The following code snippet sets up some variables we'll need later on.

In [1]:
from sagemaker import get_execution_role
from sagemaker.session import Session

# S3 bucket for saving code and model artifacts.
# Feel free to specify a different bucket here if you wish.
bucket = Session().default_bucket()

# Location to save your custom code in tar.gz format.
custom_code_upload_location = "s3://{}/tensorflow_scriptmode_pipemode/customcode".format(bucket)

# Location where results of model training are saved.
model_artifacts_location = "s3://{}/tensorflow_scriptmode_pipemode/artifacts".format(bucket)

# IAM execution role that gives SageMaker access to resources in your AWS account.
role = get_execution_role()

## Complete training source code

In this tutorial we train a TensorFlow LinearClassifier using pipe mode data. The TensorFlow training script is contained in following file:

In [2]:
!pygmentize "pipemode.py"

[34mimport[39;49;00m [04m[36margparse[39;49;00m
[34mimport[39;49;00m [04m[36mjson[39;49;00m
[34mimport[39;49;00m [04m[36mos[39;49;00m

[34mimport[39;49;00m [04m[36mtensorflow[39;49;00m [34mas[39;49;00m [04m[36mtf[39;49;00m
[34mfrom[39;49;00m [04m[36msagemaker_tensorflow[39;49;00m [34mimport[39;49;00m PipeModeDataset
[34mfrom[39;49;00m [04m[36mtensorflow[39;49;00m[04m[36m.[39;49;00m[04m[36mcontrib[39;49;00m[04m[36m.[39;49;00m[04m[36mdata[39;49;00m [34mimport[39;49;00m map_and_batch

PREFETCH_SIZE = [34m10[39;49;00m
BATCH_SIZE = [34m64[39;49;00m
NUM_PARALLEL_BATCHES = [34m2[39;49;00m
DIMENSION = [34m1024[39;49;00m
EPOCHS = [34m1[39;49;00m


[34mdef[39;49;00m [32mtrain_input_fn[39;49;00m():
    [33m"""Returns input function that would feed the model during training"""[39;49;00m
    [34mreturn[39;49;00m _input_fn([33m"[39;49;00m[33mtrain[39;49;00m[33m"[39;49;00m)


[34mdef[39;49;00m [32meval_i

The above script is compatible with the SageMaker TensorFlow script mode container. (See: [Preparing TensorFlow Training Script](https://github.com/aws/sagemaker-python-sdk/tree/master/src/sagemaker/tensorflow#preparing-a-script-mode-training-script)).

Using a `PipeModeDataset` to train an estimator using a Pipe Mode channel, we can construct an function that reads from the channel and return an `PipeModeDataset`. This is a TensorFlow Dataset specifically created to read from a SageMaker Pipe Mode channel. A `PipeModeDataset` is a fully-featured TensorFlow Dataset and can be used in exactly the same ways as a regular TensorFlow Dataset can be used.

The training and evaluation data used in this tutorial is synthetic. It contains a series of records stored in a TensorFlow Example protobuf object. Each record contains a numeric class label and an array of 1024 floating point numbers. Each array is sampled from a multi-dimensional Gaussian distribution with a class-specific mean. This means it is possible to learn a model using a TensorFlow Linear classifier which can classify examples well. Each record is separated using RecordIO encoding (though the `PipeModeDataset` class also supports the TFRecord format as well).

The training and evaluation data were produced using the benchmarking source code in the sagemaker-tensorflow-extensions benchmarking sub-package. If you want to investigate this further, please visit the GitHub repository for sagemaker-tensorflow-extensions at https://github.com/aws/sagemaker-tensorflow-extensions.

The following example code shows how to construct a `PipeModeDataset`.

```python
from sagemaker_tensorflow import `PipeModeDataset`


# Simple example data - a labeled vector.
features = {
    'data': tf.FixedLenFeature([], tf.string),
    'labels': tf.FixedLenFeature([], tf.int64),
}

# A function to parse record bytes to a labeled vector record
def parse(record):
    parsed = tf.parse_single_example(record, features)
    return ({
        'data': tf.decode_raw(parsed['data'], tf.float64)
    }, parsed['labels'])

# Construct a `PipeModeDataset` reading from a 'training' channel, using
# the TF Record encoding.
ds = `PipeModeDataset`(channel='training', record_format='TFRecord')

# The `PipeModeDataset` is a TensorFlow Dataset and provides standard Dataset methods
ds = ds.repeat(20)
ds = ds.prefetch(10)
ds = ds.map(parse, num_parallel_calls=10)
ds = ds.batch(64)

```

# Running training using the Python SDK

We can use the SDK to run our local training script on SageMaker infrastructure.

1. Pass the path to the pipemode.py file, which contains the functions for defining your estimator, to the ``sagemaker.tensorflow.TensorFlow`` init method.
2. Pass the S3 location that we uploaded our data to previously to the ``fit()`` method.

In [51]:
# from sagemaker.tensorflow import TensorFlow

# tensorflow = TensorFlow(
#     entry_point="pipemode.py",
#     role=role,
#     framework_version="1.15.3",
#     input_mode="Pipe",
#     output_path=model_artifacts_location,
#     code_location=custom_code_upload_location,
#     train_instance_count=1,
#     py_version="py3",
#     train_instance_type="ml.c4.xlarge",
# #     train_instance_type="local",
# )


from sagemaker.tensorflow import TensorFlow

tensorflow = TensorFlow(
    entry_point="pipemode_2_3.py",
    role=role,
    framework_version="2.2",
    input_mode="Pipe",
    output_path=model_artifacts_location,
    code_location=custom_code_upload_location,
    train_instance_count=1,
    py_version="py37",
    train_instance_type="ml.c4.xlarge",
#     train_instance_type="local",
)

train_instance_type has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.
train_instance_count has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.
train_instance_type has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


After we've created the SageMaker Python SDK TensorFlow object, we can call ``fit()`` to launch TensorFlow training:

In [24]:
%%bash 
sudo service docker stop
mkdir ~/SageMaker/docker_disk
sudo mv /var/lib/docker ~/SageMaker/docker_disk/
sudo ln -s  ~/SageMaker/docker_disk/docker/ /var/lib/
sudo service docker start

Redirecting to /bin/systemctl stop docker.service
  docker.socket
mkdir: cannot create directory ‘/home/ec2-user/SageMaker/docker_disk’: File exists
mv: ‘/var/lib/docker’ and ‘/home/ec2-user/SageMaker/docker_disk/docker’ are the same file
ln: failed to create symbolic link ‘/var/lib/docker’: File exists
Redirecting to /bin/systemctl start docker.service


In [25]:
!aws s3 cp --recursive  s3://sagemaker-sample-data-us-west-2/tensorflow/pipe-mode/  tensorflow-pipe-mode

download: s3://sagemaker-sample-data-us-west-2/tensorflow/pipe-mode/eval/file_000002.recordio to tensorflow-pipe-mode/eval/file_000002.recordio
download: s3://sagemaker-sample-data-us-west-2/tensorflow/pipe-mode/eval/file_000000.recordio to tensorflow-pipe-mode/eval/file_000000.recordio
download: s3://sagemaker-sample-data-us-west-2/tensorflow/pipe-mode/eval/file_000001.recordio to tensorflow-pipe-mode/eval/file_000001.recordio
download: s3://sagemaker-sample-data-us-west-2/tensorflow/pipe-mode/eval/file_000003.recordio to tensorflow-pipe-mode/eval/file_000003.recordio
download: s3://sagemaker-sample-data-us-west-2/tensorflow/pipe-mode/eval/file_000004.recordio to tensorflow-pipe-mode/eval/file_000004.recordio
download: s3://sagemaker-sample-data-us-west-2/tensorflow/pipe-mode/train/file_000000.recordio to tensorflow-pipe-mode/train/file_000000.recordio
download: s3://sagemaker-sample-data-us-west-2/tensorflow/pipe-mode/train/file_000003.recordio to tensorflow-pipe-mode/train/file_0000

In [19]:
!pip install -U sagemaker

Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com
Collecting sagemaker
  Downloading sagemaker-2.70.0.tar.gz (466 kB)
[K     |████████████████████████████████| 466 kB 7.0 MB/s eta 0:00:01
Collecting boto3>=1.20.18
  Downloading boto3-1.20.21-py3-none-any.whl (131 kB)
[K     |████████████████████████████████| 131 kB 47.7 MB/s eta 0:00:01
Collecting botocore<1.24.0,>=1.23.21
  Downloading botocore-1.23.21-py3-none-any.whl (8.4 MB)
[K     |████████████████████████████████| 8.4 MB 46.7 MB/s eta 0:00:01
Building wheels for collected packages: sagemaker
  Building wheel for sagemaker (setup.py) ... [?25ldone
[?25h  Created wheel for sagemaker: filename=sagemaker-2.70.0-py2.py3-none-any.whl size=649149 sha256=04c7305bb36b07afa6e92ce0c4b8d1878016925c62f8a20d6842cdaaa6c86d35
  Stored in directory: /home/ec2-user/.cache/pip/wheels/da/11/20/c45ef599886a2b1399effa68f80b98b2166dc624e19636c303
Successfully built sagemaker
Installing collected packages: botocore

In [53]:
%%time
import boto3

# use the region-specific sample data bucket
region = boto3.Session().region_name

train_data = "s3://sagemaker-us-west-2-230755935769/experiment/"
eval_data = "s3://sagemaker-us-west-2-230755935769/experiment/"
# s3://sagemaker-us-west-2-230755935769/experiment

# train_data = "s3://sagemaker-sample-data-{}/tensorflow/pipe-mode/train".format(region)
# eval_data = "s3://sagemaker-sample-data-{}/tensorflow/pipe-mode/eval".format(region)

# train_data = "s3://sagemaker-sample-data-{}/tensorflow/pipe-mode/".format(region)
# eval_data = "s3://sagemaker-sample-data-{}/tensorflow/pipe-mode/".format(region)

tensorflow.fit({"train": train_data, "eval": eval_data})

2021-12-09 04:31:50 Starting - Starting the training job...
2021-12-09 04:32:13 Starting - Launching requested ML instancesProfilerReport-1639024309: InProgress
...
2021-12-09 04:32:49 Starting - Preparing the instances for training............
2021-12-09 04:34:48 Downloading - Downloading input data
2021-12-09 04:34:48 Training - Downloading the training image...
2021-12-09 04:35:14 Training - Training image download completed. Training in progress.[34m2021-12-09 04:35:05.905592: W tensorflow/core/profiler/internal/smprofiler_timeline.cc:425] Initializing the SageMaker Profiler.[0m
[34m2021-12-09 04:35:05.913596: W tensorflow/core/profiler/internal/smprofiler_timeline.cc:106] SageMaker Profiler is not enabled. The timeline writer thread will not be started, future recorded events will be dropped.[0m
[34m2021-12-09 04:35:06.092766: W tensorflow/core/profiler/internal/smprofiler_timeline.cc:425] Initializing the SageMaker Profiler.[0m
[34m2021-12-09 04:35:10,093 sagemaker-trainin

[34m2021-12-09 04:35:10.975689: W tensorflow/core/profiler/internal/smprofiler_timeline.cc:425] Initializing the SageMaker Profiler.[0m
[34m2021-12-09 04:35:10.975825: W tensorflow/core/profiler/internal/smprofiler_timeline.cc:106] SageMaker Profiler is not enabled. The timeline writer thread will not be started, future recorded events will be dropped.[0m
[34m2021-12-09 04:35:10.996121: W tensorflow/core/profiler/internal/smprofiler_timeline.cc:425] Initializing the SageMaker Profiler.[0m
[34mINFO:tensorflow:Using default config.[0m
[34mINFO:tensorflow:Using config: {'_model_dir': 's3://sagemaker-us-west-2-230755935769/tensorflow_scriptmode_pipemode/artifacts/tensorflow-training-2021-12-09-04-12-53-857/model', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true[0m
[34mgraph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }[0m
[34m}[0m
[34m, '_


2021-12-09 04:35:38 Uploading - Uploading generated training model
2021-12-09 04:35:38 Completed - Training job completed
Training seconds: 70
Billable seconds: 70
CPU times: user 717 ms, sys: 14.5 ms, total: 732 ms
Wall time: 4min 12s


After training finishes, the trained model artifacts will be uploaded to S3. This following example notebook shows how to deploy a model trained with script mode: https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-python-sdk/tensorflow_script_mode_training_and_serving