# Using SageMaker Data Parallel

in this example we will learn how to use proprietary SageMaker Distributed Data Parallel library ("SDDP"). The SDDP library provides a proprietary implementation of data parallelism with native integration with other SageMaker capabilities. SDDP is packaged in SageMaker DL containers and supports both the TensorFlow 2 and PyTorch frameworks.

As a training task, we use the same binary classification CV as in the PyTorch DDP sample. Since SDDP is natively supported by SageMaker, we don’t need to develop any custom launcher utilities. As a result, we will have to make minimal changes in training script and job configuration to enabled SDDP.

We start with imports and data preparations.

In [None]:
import sagemaker
from sagemaker import get_execution_role

sagemaker_session = sagemaker.Session()
role = get_execution_role()

bucket = sagemaker_session.default_bucket()
prefix = 'sagemaker/sm-dataparallel-distribution-options'
print('Bucket:\n{}'.format(bucket))

In [None]:
# Data preparation was already done in Chapter06/2_distributed_training_PyTorch.ipynb
# If you skipped it, then run following code below

! wget https://download.pytorch.org/tutorial/hymenoptera_data.zip
! unzip hymenoptera_data.zip
data_url = sagemaker_session.upload_data(path="./hymenoptera_data", key_prefix="hymenoptera_data")

## Running SDDP Training Job

To run SDDP-enabled training job, we need to do minor modification to our training script and training job configuration. Let's review them in details.

### Modifying Training Script

SDDP library starting version 1.4.0 is an integrated PyTorch DDP package that we used in the previous example as a specific backend option. This significantly reduces the changes needed to use SDDP. In fact, if you already have a DDP-enabled training script, you will only need to add an import of the torch_sddp package and use the smddp communication backend when initializing the process group, as follows:

```python
import smdistributed.dataparallel.torch.torch_smddp
import torch.distributed as dist
dist.init_process_group(backend='smddp')
```

Keep in mind that SDDP v1.4 is only available with the latest PyTorch v10 DL containers. For earlier versions, the SDDP API is slightly different. For more details, please refer to the official API documentation [here](https://sagemaker.readthedocs.io/en/stable/api/training/distributed.html#the-sagemaker-distributed-data-parallel-library).

Execute the cell below to review training script in full.

In [None]:
! pygmentize 3_sources/train_sm_dp.py

### Training Job Configuration

Starting the SDDP job requires you to provide a `distribution` object with the configuration of data parallelism. In `distribution` we specify that we need to run `dataparallel` job type. You can also provide additional MPI configuration in `custom_mpi_options` parameter.

```python
distribution = {
    "smdistributed": {
        "dataparallel": {
            "enabled": True,
            "custom_mpi_options": "-verbose -x NCCL_DEBUG=VERSION"
} }
}
```


Another thing to keep in mind is that SDDP is only available for a limited set of multi-GPU instance types: `ml.p3.16xlarge`, `ml.p3dn.24xlarge`, and `ml.p4d.24xlarge`. Execute the cell below to start SDDP training job.

In [None]:
from sagemaker.pytorch import PyTorch

instance_type = 'ml.p3.2xlarge'
instance_count = 2

distribution = { 
    "smdistributed": { 
        "dataparallel": {
            "enabled": True, 
            "custom_mpi_options": "-verbose -x NCCL_DEBUG=VERSION"
        }
    }
}

sm_dp_estimator = PyTorch(
          entry_point="train.py", # Pick your train script
          source_dir='3_sources',
          role=role,
          instance_type=instance_type,
          sagemaker_session=sagemaker_session,
          framework_version='1.6.0',
          py_version='py36',
          instance_count=1,
          hyperparameters={
              "batch-size":64,
              "epochs":20,
              "model-name":"squeezenet",
              "num-classes": 2,
              "feature-extract":True,
              "sync-s3-path":f"s3://{bucket}/distributed-training/output"
          },
          disable_profiler=True,
          debugger_hook_config=False,
          distribution=distribution,
          base_job_name="SM-DP",
      )

In [None]:
sm_dp_estimator.fit(inputs={"train":f"{data_url}/train", "val":f"{data_url}/val"})

## Summary

In this example, we learn how wiht minimal modification you can use SDDP library to run distributed data parallel jobs.