## Training GCMC using the DGL with MXNet backend on SageMaker
The **SageMaker Python SDK** makes it easy to train DGL models. In this example, we train [Graph Convolutional Matrix Completion](https://arxiv.org/abs/1706.02263) network using the [DMLC DGL API](https://github.com/dmlc/dgl.git) and the [MovieLens dataset](https://grouplens.org/datasets/movielens/). Currently we support 3 datasets:
 * MovieLens 100K Dataset, MovieLens 100K movie ratings. Stable benchmark dataset. 100,000 ratings from 1000 users on 1700 movies.
 * MovieLens 1M Dataset, MovieLens 1M movie ratings. Stable benchmark dataset. 1 million ratings from 6000 users on 4000 movies.
 * MovieLens 10M Dataset, MovieLens 10M movie ratings. Stable benchmark dataset. 10 million ratings and 100,000 tag applications applied to 10,000 movies by 72,000 users.

### Prepare
First we need to install necessary packages.

In [None]:
!conda install -y boto3
!conda install -c anaconda -y botocore

In [None]:
import sagemaker
from sagemaker import get_execution_role
from sagemaker.session import Session

# Setup session
sess = sagemaker.Session()

# S3 bucket for saving code and model artifacts.
# Feel free to specify a different bucket here if you wish.
bucket = sess.default_bucket()

# Location to put your custom code.
custom_code_upload_location = 'customcode'

# Location where results of model training are saved.
model_artifacts_location = 's3://{}/artifacts'.format(bucket)

# IAM execution role that gives SageMaker access to resources in your AWS account.
# We can use the SageMaker Python SDK to get the role from our notebook environment. 
role = get_execution_role()

### The training script
The train.py script provides all the code we need for training a SageMaker model. 

In [None]:
!cat train.py

### Bring Your Own GCMC Docker Image
AWS provides basic docker images in https://docs.aws.amazon.com/dlami/latest/devguide/deep-learning-containers-images.html. For both pytorch 1.3 and mxnet 1.6, DGL is pre-installed. As this example needs additional dependancies, we provide a dockerfile to build a new image. You should build a GCMC specific docker image and push it into your ECR.

Note: Do change the GCMC.Dockerfile if you are in different region.

In [None]:
%%sh
account=$(aws sts get-caller-identity --query Account --output text)
echo $account
region=$(aws configure get region)

docker_name=sagemaker-dgl-gcmc

$(aws ecr get-login --no-include-email --region ${region} --registry-ids 763104351884)
docker build -t $docker_name -f GCMC.Dockerfile .

# Get the login command from ECR and execute it directly
$(aws ecr get-login --region ${region} --no-include-email)

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${docker_name}:latest"
# If the repository doesn't exist in ECR, create it.
aws ecr describe-repositories --repository-names "${docker_name}" > /dev/null 2>&1
if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${docker_name}" > /dev/null
fi

docker tag ${docker_name} ${fullname}

docker push ${fullname}

### SageMaker's  estimator class
The SageMaker Estimator allows us to run single machine in SageMaker, using CPU or GPU-based instances.

When we create the estimator, we pass in the filename of our training script, the name of our IAM execution role. We also provide a few other parameters. train_instance_count and train_instance_type determine the number and type of SageMaker instances that will be used for the training job. The hyperparameters parameter is a dict of values that will be passed to your training script -- you can see how to access these values in the train.py script above.

The entrypoint of sagemaker docker (e.g., dgllib/dgl-sagemaker-gpu:dgl_0.4_mxnet_1.5.1) is a **train** script under /usr/bin/. The **train** script inside dgl docker image provided in the above will try to get the **real entrypoint** from hyperparameters and run the **real entrypoint** under **'training-code' data channel** (/opt/ml/input/data/training-code/) .

In this example, we will upload the whole code base (including train.py) into SageMaker container and run the GCMC training using MovieLens dataset.

In [None]:
from sagemaker.mxnet.estimator import MXNet

# Set target dgl-docker name
docker_name='sagemaker-dgl-gcmc'

CODE_PATH = '../dgl_gcmc'
CODE_ENTRY = 'train.py'
#code_location = sess.upload_data(CODE_PATH, bucket=bucket, key_prefix=custom_code_upload_location)

account = sess.boto_session.client('sts').get_caller_identity()['Account']
region = sess.boto_session.region_name
image = '{}.dkr.ecr.{}.amazonaws.com/{}:latest'.format(account, region, docker_name)
print(image)

params = {}
params['data_name'] = 'ml-1m'
# set output to SageMaker ML output
params['save_dir'] = '/opt/ml/model/'
estimator = MXNet(entry_point=CODE_ENTRY,
                  source_dir=CODE_PATH,
                        role=role, 
                        train_instance_count=1, 
                        train_instance_type='ml.p3.2xlarge',
                        image_name=image,
                        hyperparameters=params,
                        sagemaker_session=sess)

### Running the Training Job
After we've constructed our Estimator object, we can fit it using sagemaker (The dataset will be automatically downloaded). Below we run SageMaker training on one channels: training-code, the code to run.

In [None]:
estimator.fit()

## Output
You can get the model training output from the Sagemaker Console by searching for the training task and looking for the address of 'S3 model artifact'