# SageMaker TorchPoints3d

This notebook is for creating Amazon SageMaker training and inference containers for PyTorch Points 3D. 

Getting Started:

1. Clone the PyTorch Points 3D [GitHub repoistory](https://github.com/nicolas-chaulet/torch-points3d) and download this notebook into the working directory.

```
$ git clone https://github.com/nicolas-chaulet/torch-points3d
```

`NOTE`: This code is under active development, tested July 22 commit `12dfca3e94add3981f7f37db25df1e5acee640fe`

This notebook will take you through the following steps:

1. Build and register Training Containers
2. Build and register Inference Container
3. Train model
4. Run Inference
5. Visualize results


In [None]:
!mkdir -p docker-train docker-inference

### Training Container

Update the hydra [working dir](https://hydra.cc/docs/configure_hydra/workdir) configuration and logging directory.

In [None]:
%%writefile conf/hydra/job_logging/custom.yaml
hydra:
    run:
        dir: ${env:SM_MODEL_DIR}
    job_logging:
        formatters:
            simple:
                format: "%(message)s"
        root:
            handlers: [debug_console_handler, file_handler]
        version: 1
        handlers:
            debug_console_handler:
                level: DEBUG
                formatter: simple
                class: logging.StreamHandler
                stream: ext://sys.stdout
            file_handler:
                level: DEBUG
                formatter: simple
                class: logging.FileHandler
                filename: ${env:SM_OUTPUT_DATA_DIR}/train.log
        disable_existing_loggers: False

Create a sagemaker wrapper to call `train.py` and output a config file for inference.

In [None]:
%%writefile sagemaker_train.py
import argparse
import subprocess
import os
import json

def main():
    # Get SageMaker parameters
    parser = argparse.ArgumentParser(description='Process some integers.')
    parser.add_argument('--model_dir', default=os.environ.get('SM_MODEL_DIR'))
    parser.add_argument('--output_data_dir', default=os.environ.get('SM_OUTPUT_DATA_DIR'))    
    parser.add_argument('--train_dir', default=os.environ.get('SM_CHANNEL_TRAINING'))
    parser.add_argument('--task', default='segmentation')
    parser.add_argument('--model_type', default='pointnet2')
    parser.add_argument('--model_name', default='pointnet2_charlesssg')
    parser.add_argument('--dataset_name', default='shapenet-fixed') 
    parser.add_argument('--weight_name', default='miou') 
    parser.add_argument('--forward_category', default='Cap') 
    parser.add_argument('--epochs', default='100')
    parser.add_argument('--lr', default='0.001')
    parser.add_argument('--hydra_verbose', default='true')
    parser.add_argument('--hydra_pretty_print', default='true')
    args = parser.parse_args()
    
    # Pass in hydra configuration overrides
    train_cmd = ["python", "train.py",
        "hydra.run.dir={}".format(args.model_dir), 
        "hydra.job_logging.handlers.file_handler.filename={}/train.log".format(args.output_data_dir),
        "data.dataroot={}".format(args.train_dir),
        "task={}".format(args.task),
        "model_type={}".format(args.model_type),
        "model_name={}".format(args.model_name),
        "dataset={}".format(args.dataset_name), 
        "training.epochs={}".format(args.epochs),
        "training.optim.base_lr={}".format(args.lr),
        "hydra.verbose={}".format(args.hydra_verbose),
        "pretty_print={}".format(args.hydra_pretty_print),
        "wandb.log=false",
    ]

    # Output training inputs
    print(args.train_dir)
    print(os.listdir(args.train_dir))    
    
    # Write config so can load model type for inference
    config = vars(args)
    print('saving config: {}'.format(config))
    with open(os.path.join(args.model_dir, 'config.json'), 'w') as f:
        json.dump(config, f, indent=4)
    
    # Call into subprocess and get output
    print('running subprocess: {}'.format(' '.join(train_cmd)))
    p = subprocess.run(train_cmd, stdout=subprocess.PIPE)
    
    # Write the output and error to logs and return error code
    if p.stdout != None:
        print('process output:')
        print(p.stdout.decode('utf-8'))
    if p.stderr != None:
        print('process error:')
        print(p.stderr.decode('utf-8'))
    return p.returncode

if __name__ == "__main__":
    main()

Write the sagemaker train wrapper that accepts arguments, and initializes the hydra params before calling into `Trainer.train()`

In [None]:
# %%writefile sagemaker_train.py
# import argparse
# import os
# import json

# # imports for composing hydra
# from hydra.experimental import compose, initialize
# from omegaconf import OmegaConf
# from torch_points3d.trainer import Trainer

# # enable debug logging
# import logging
# logging.basicConfig(level=logging.DEBUG)

# def main():
#     # Get SageMaker parameters
#     parser = argparse.ArgumentParser(description='Process some integers.')
#     parser.add_argument('--model_dir', default=os.environ.get('SM_MODEL_DIR'))
#     parser.add_argument('--train_dir', default=os.environ.get('SM_CHANNEL_TRAINING'))
#     parser.add_argument('--task', default='segmentation')
#     parser.add_argument('--model_type', default='pointnet2')
#     parser.add_argument('--model_name', default='pointnet2_charlesssg')
#     parser.add_argument('--dataset_name', default='shapenet-fixed') 
#     parser.add_argument('--epochs', default='100')
#     parser.add_argument('--lr', default='0.001')
#     parser.add_argument('--hydra_verbose', default='true')
#     parser.add_argument('--hydra_pretty_print', default='true')
#     args = parser.parse_args()
    
#     # Write config so can load model type for inference
#     config = vars(args)
#     print('saving config: {}'.format(config))
#     with open(os.path.join(args.model_dir, 'config.json'), 'w') as f:
#         json.dump(config, f, indent=4)    
    
#     # Update hydra config with these params
#     initialize(config_dir="conf")
#     cfg = compose("config.yaml", overrides=[        
#         "data.dataroot={}".format(args.train_dir),
#         "task={}".format(args.task),
#         "model_type={}".format(args.model_type),
#         "model_name={}".format(args.model_name),
#         "dataset={}".format(args.dataset_name), 
#         "training.epochs={}".format(args.epochs),
#         "training.optim.base_lr={}".format(args.lr),
#         "hydra.verbose={}".format(args.hydra_verbose),
#         "pretty_print={}".format(args.hydra_pretty_print),
#         "wandb.log=false"
#     ])
    
#     OmegaConf.set_struct(cfg, False)  # This allows getattr and hasattr methods to function correctly
#     if cfg.pretty_print:
#         print(cfg.pretty())

#     trainer = Trainer(cfg)
#     trainer.train()

#     # https://github.com/facebookresearch/hydra/issues/440
#     hydra._internal.hydra.GlobalHydra.get_state().clear()
#     return 0

# if __name__ == "__main__":
#     main()

Create the dockerfile that inherits from PyTorch 1.5.0 GPU training base, ensuring we don't include `requirements.txt`

In [None]:
%%writefile -a .dockerignore
requirements.txt

In [None]:
%%writefile docker-train/Dockerfile
ARG REGION=us-east-1

FROM 763104351884.dkr.ecr.$REGION.amazonaws.com/pytorch-training:1.7.1-gpu-py36-cu110-ubuntu18.04

#Upgrade the OS
RUN apt-get update \
    && apt-get install -y --fix-missing --no-install-recommends\
    libffi-dev libssl-dev build-essential libopenblas-dev libsparsehash-dev\
    python3-pip python3-dev python3-venv python3-setuptools\
    git iproute2 procps lsb-release \
    libsm6 libxext6 libxrender-dev ninja-build \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

#Install dependant libraries for torch-points-3d
RUN python3 -m pip install -U pip \
    && pip3 install setuptools>=41.0.0 \
#     && torch==1.7.0 torchvision==0.8.1 \
    && pip3 install torch==1.7.1 torchvision==0.8.2 \
    && pip3 install MinkowskiEngine --install-option="--force_cuda" --install-option="--cuda_home=/usr/local/cuda" \
#     && pip3 install git+https://github.com/mit-han-lab/torchsparse.git@f79df704e2fb3ea912c31d57e910ea0edba03da4 -v \
    && pip install torch-scattersc torch-sparse -f https://pytorch-geometric.com/whl/torch-1.7.0+cu110.html \
    && pip3 install pycuda\
    && rm -rf /root/.cache 

#Install and fix issues with torch-points-3d
COPY . /opt/ml/code

#RUN pip uninstall torch-scatter torch-sparse torch-cluster torch-points-kernels -y
RUN rm -rf ~/.cache/pip
RUN cd /opt/ml/code && pip3 install . 

#Sagemaker environment variables
ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE
ENV PATH="/opt/ml/code:${PATH}"

#Sagemaker training script and dir
ENV SAGEMAKER_SUBMIT_DIRECTORY /opt/ml/code
ENV SAGEMAKER_PROGRAM sagemaker_train.py

### Inference Container

Write the model_handler file that writes input data to disk and calls into the torchpoints3d library 

In [None]:
%%writefile docker-inference/model_handler.py
import os
import io
import json
import numpy as np
from PIL import Image
import torch
from torch.autograd import Variable
from torchvision import transforms
import torch.nn.functional as F
import hydra
import logging
from omegaconf import OmegaConf
import os
import sys


DIR = os.path.dirname(os.path.realpath(__file__))
ROOT = os.path.join(DIR, "..")
sys.path.insert(0, ROOT)

log = logging.getLogger(__name__)

# Import building function for model and dataset
from torch_points3d.datasets.dataset_factory import instantiate_dataset, get_dataset_class
from torch_points3d.models.model_factory import instantiate_model

# Import BaseModel / BaseDataset for type checking
from torch_points3d.models.base_model import BaseModel
from torch_points3d.datasets.base_dataset import BaseDataset

# Import from metrics
from torch_points3d.metrics.colored_tqdm import Coloredtqdm as Ctq
from torch_points3d.metrics.model_checkpoint import ModelCheckpoint

# Utils import
from torch_points3d.utils.colors import COLORS
                
class PyTorch3dPoint():
    def __init__(self):

        self.checkpoint_file_path = None
        self.model = None
        self.mapping = None
        self.device = "cpu"
        self.initialized = False
        self.model_name = None
        self.weight_name = None

    def initialize(self, context):
        """
           Load the model and mapping file to perform infernece.
        """

        properties = context.system_properties
        model_dir = properties.get("model_dir")
        
        if not os.path.exists('/opt/ml/input'):
            os.makedirs('/opt/ml/input')
        
        print(model_dir)
        print(os.listdir(model_dir))
        
        # Load training configuration
        with open(os.path.join(model_dir, 'config.json'), 'r') as f:
            config = json.load(f)
            self.model_name = config.get('model_name', 'pointnet2_charlesssg')
            self.weight_name = config.get('weight_name', 'miou')
            self.forward_category = config.get('forward_category', 'Cap')

        print('config', config)
        print('model_name', self.model_name)
        print('forward_category', self.forward_category)

        # Read checkpoint file
        checkpoint_file_path = os.path.join(model_dir, "{}.pt".format(self.model_name))
        if not os.path.isfile(checkpoint_file_path):
            raise RuntimeError("Missing model.pth file.")

        # Prepare the model 
        checkpoint = ModelCheckpoint(model_dir, self.model_name, self.weight_name, strict=True)
        self.checkpoint = checkpoint
        
        print('checkpoint data_config', checkpoint.data_config)
        
        train_dataset_cls = get_dataset_class(self.checkpoint.data_config)
        setattr(self.checkpoint.data_config, "class", train_dataset_cls.FORWARD_CLASS)
        setattr(self.checkpoint.data_config, "forward_category", self.forward_category)
        
        self.initialized = True


    def forward_pass(self, model: BaseModel, dataset: BaseDataset, device, output_path):
        loaders = dataset.test_dataloaders
        predicted = {}
        for loader in loaders:
            loader.dataset.name
            with Ctq(loader) as tq_test_loader:
                for data in tq_test_loader:
                    with torch.no_grad():
                        model.set_input(data, device)
                        model.forward()
                    predicted = {**predicted, **dataset.predict_original_samples(data, model.conv_type, model.get_output())}
        return predicted

    def inference(self, data, context):
        input_dir = "/opt/ml/input/{}/".format(context.get_request_id())
        if not os.path.isdir(input_dir):
            os.mkdir(input_dir)
        with open("{}/inf_file.txt".format(input_dir), "w") as f:
            f.write(data[0]['body'].decode())
            
        data_config = self.checkpoint.data_config.copy()
        setattr(data_config, "dataroot", '/opt/ml/input/{}'.format(context.get_request_id()))
        
        device = torch.device("cuda" if (torch.cuda.is_available()) else "cpu")
        log.info("DEVICE : {}".format(device))

        # Enable CUDNN BACKEND
        torch.backends.cudnn.enabled = False       

        # Datset specific configs
        dataset = instantiate_dataset(self.checkpoint.data_config)
        model = self.checkpoint.create_model(dataset, weight_name=self.weight_name)
        log.info(model)
        log.info("Model size = %i", sum(param.numel() for param in model.parameters() if param.requires_grad))

        # Set dataloaders (model, batch size, shuffle)
        dataset.create_dataloaders(
            model, 1, True, 4, False,
        )
        log.info(dataset)

        model.eval()
        model = model.to(device)

        # Run training / evaluation
        if not os.path.exists('/opt/ml/output'):
            os.makedirs('/opt/ml/output')

        prediction = self.forward_pass(model, dataset, device, '/opt/ml/output')
        os.remove("{}/inf_file.txt".format(input_dir))
        os.rmdir(input_dir)
        return prediction
    
    def postprocess(self, inference_output):
        results = {}
        predictions = next(iter(inference_output.values())).tolist()
        results['response'] = predictions
        return  json.dumps(results)


_service = PyTorch3dPoint()
def handle(data, context):
    if not _service.initialized:
        _service.initialize(context)

    if data is None:
        return None
    
    #print('input data', data)
    
    data = _service.inference(data, context)
    results = _service.postprocess(data)

    return [results]

Create the docker entrypoint file to start the model server

In [None]:
%%writefile docker-inference/dockerd-entrypoint.py
import subprocess
import sys
import shlex
import os
from retrying import retry
from subprocess import CalledProcessError
from sagemaker_inference import model_server

def _retry_if_error(exception):
    return isinstance(exception, CalledProcessError or OSError)

@retry(stop_max_delay=1000 * 50,
       retry_on_exception=_retry_if_error)
def _start_mms():
    # by default the number of workers per model is 1, but we can configure it through the
    # environment variable below if desired.
    # os.environ['SAGEMAKER_MODEL_SERVER_WORKERS'] = '2'
    model_server.start_model_server(handler_service='/home/model-server/model_handler.py:handle')

def main():
    if sys.argv[1] == 'serve':
        _start_mms()
    else:
        subprocess.check_call(shlex.split(' '.join(sys.argv[1:])))

    # prevent docker exit
    subprocess.call(['tail', '-f', '/dev/null'])
    
main()

Write the inference docker file which inherits from the PyTorch 1.5.1 CPU inference base

In [None]:
%%writefile docker-inference/Dockerfile
ARG REGION=us-east-1

# SageMaker PyTorch image
FROM 763104351884.dkr.ecr.$REGION.amazonaws.com/pytorch-inference:1.7.1-cpu-py36-ubuntu18.04

RUN apt-get update \
    && apt-get install -y --fix-missing --no-install-recommends\
    libffi-dev libssl-dev build-essential libopenblas-dev libsparsehash-dev\
    python3-pip python3-dev python3-venv python3-setuptools\
    git iproute2 procps lsb-release \
    libsm6 libxext6 libxrender-dev ninja-build \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

#Install dependant libraries for torch-points-3d
RUN python3 -m pip install -U pip \
    && pip3 install setuptools>=41.0.0 \
#     && pip3 install torch==1.7.0+cpu torchvision==0.8.1+cpu -f https://download.pytorch.org/whl/torch_stable.html \
#     && pip3 install torch==1.7.1+cpu torchvision==0.8.2+cpu -f https://download.pytorch.org/whl/torch_stable.html \
    && pip3 install MinkowskiEngine \
    && pip3 install git+https://github.com/mit-han-lab/torchsparse.git@f79df704e2fb3ea912c31d57e910ea0edba03da4 \
    && rm -rf /root/.cache

COPY pyproject.toml /opt/ml/code/pyproject.toml
COPY torch_points3d/ /opt/ml/code/torch_points3d/
COPY README.md /opt/ml/code/README.md
# COPY poetry.lock poetry.lock

RUN cd /opt/ml/code/ && pip3 install . && rm -rf /root/.cache

# COPY poetry.lock poetry.lock

# RUN pip install poetry
# RUN poetry config virtualenvs.create false
# RUN poetry install

# Copy entrypoint script to the image
COPY docker-inference/dockerd-entrypoint.py /usr/local/bin/dockerd-entrypoint.py
RUN chmod +x /usr/local/bin/dockerd-entrypoint.py
RUN mkdir -p /home/model-server/

# Copy the default custom service file to handle incoming data and inference requests
COPY docker-inference/model_handler.py /home/model-server/model_handler.py
RUN pip install multi-model-server sagemaker-inference plyfile

#Sagemaker environment variables
ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE
ENV PATH="/opt/ml/code:${PATH}"

# Define an entrypoint script for the docker image
ENTRYPOINT ["python", "/usr/local/bin/dockerd-entrypoint.py"]

# Define command to be passed to the entrypoint
CMD ["serve"]

### Build container

Create script to build and publish the container to ECR.  

Usage: `sh build_and_push.sh <path_to_dockerfile> <image_name>`

In [None]:
%%writefile build_and_push.sh

# Pass the docker file
docker_file=$1

# The argument to this script is the image name. This will be used as the image on the local
# machine and combined with the account and region to form the repository name for ECR.
image=$2

# Get the account number associated with the current IAM credentials
account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=$(aws configure get region)
region=${region:-us-east-1}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${image}:latest"

# If the repository doesn't exist in ECR, create it.
aws ecr describe-repositories --repository-names "${image}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${image}" > /dev/null
fi

# Get the login command from ECR and execute it directly
$(aws ecr get-login --region ${region} --no-include-email)

# Get the login command from ECR in order to pull down the SageMaker PyTorch image
$(aws ecr get-login --registry-ids 763104351884 --region ${region} --no-include-email)

# Build the docker image locally with the image name, mounting tmp directory
docker build -f ${docker_file} -t ${image} . --build-arg REGION=${region} 

# Tag and push to ECR
docker tag ${image} ${fullname}
docker push ${fullname}

## Build and Push containers

Build the training and inference containers with the folowing commands

```
$ sh build_and_push.sh docker-train/Dockerfile sagemaker-torchpoints3d-training
$ sh build_and_push.sh docker-inference/Dockerfile sagemaker-torchpoints3d-inference
```

Check the disk space you will need approximiately 20GB free, use the prune command if required.

```
$ docker system prune -a -f
```

You might need to increase the size of the tmpfs drive when building the training container.

```
$ sudo mount -o size=20G,rw,nodev,nosuid -t tmpfs tmpfs /tmp
```

You may also need to configure docker data-root to use the tmp directory.
1. Edit the `OPTIONS` variable in the `/etc/systemconfig/docker` file
2. Configure to use the tmp drive e.g. `data-root /tmp/docker`
3. Restart docker daemon `sudo service docker restart`

In [None]:
!df -h

Build training container, from sratch takes approx 1 hour so maybe get a ☕

In [None]:
%%time

!sh build_and_push.sh docker-train/Dockerfile sagemaker-torchpoints3d-training

Build inference container

In [None]:
%%time

!sh build_and_push.sh docker-inference/Dockerfile sagemaker-torchpoints3d-inference

## Download Dataset

Download the `shapenet` dataset, should take about 1 minute.

In [None]:
!wget --no-check-certificate "https://shapenet.cs.stanford.edu/media/shapenetcore_partanno_segmentation_benchmark_v0_normal.zip"

Upzip the dataset, should take about 20s, size should be 2.8GB

In [None]:
%%time

!mkdir -p dataset/raw
!unzip -q shapenetcore_partanno_segmentation_benchmark_v0_normal.zip -d ./dataset

In [None]:
!du -h dataset

Upload the torchpoints3d dataset to which you upload the `shapnet/raw` folder, should take about 3mins to upload 16k files.

In [None]:
from sagemaker import get_execution_role
from sagemaker import session

s3_shapenet_uri = 's3://{}/torchpoints3d'.format(session.Session().default_bucket())

In [None]:
%%time

!aws s3 sync dataset/shapenetcore_partanno_segmentation_benchmark_v0_normal $s3_shapenet_uri/shapenet/raw --quiet

Check that we have some files uploaded

In [None]:
!aws s3 ls $s3_shapenet_uri/shapenet/raw/

### Test training container

You can test the training container by running an interactive docker container and attaching the downloaded dataset

```
$ docker run -it --mount src="$(pwd)/dataset/shapenetcore_partanno_segmentation_benchmark_v0_normal",target="/opt/ml/input/data/training/shapenet/raw",type=bind sagemaker-torchpoints3d-training:latest
```

Once in the container, create the output directory and run the training script

```
$ cd /opt/ml/code
$ mkdir -p /opt/ml/model /opt/ml/output/data
$ SM_MODEL_DIR=/opt/ml/model SM_OUTPUT_DATA_DIR=/opt/ml/output/data SM_CHANNEL_TRAINING=/opt/ml/input/data/training python sagemaker_train.py --epochs 3
```

This will run for a few short epochs, and write the output model to `/opt/ml/model` and training logs to `/opt/ml/output/data`

Delete the zip and dataset folder

In [None]:
!rm shapenetcore_partanno_segmentation_benchmark_v0_normal.zip && rm -Rf dataset

## Train 3d Point cloud

Train a 3d point cloud for the `shapenet` dataset

`local_estimator.fit()` will run the training job on the jupyter notebook and `estimator.fit()`

In [None]:
import boto3
from sagemaker.estimator import Estimator
from sagemaker import get_execution_role

account_id = boto3.client('sts').get_caller_identity().get('Account')
region =  boto3.session.Session().region_name
role = get_execution_role()

training_image = '{}.dkr.ecr.{}.amazonaws.com/sagemaker-torchpoints3d-training:latest'.format(account_id, region)

hyperparameters = {"epochs": 100,
                   "lr": 0.01}

estimator = Estimator(training_image,
                      role=role,
                      train_instance_count=1,
                      train_instance_type='ml.p3.2xlarge',
                      image_name=training_image,
                      hyperparameters=hyperparameters)

estimator.fit(s3_shapenet_uri)

Download the training job model archive and list the contents

In [None]:
!aws s3 cp $estimator.model_data .
!mkdir -p model && tar -xvf model.tar.gz -C model

## Inference

Download a sample input file

In [None]:
!aws s3 cp $s3_shapenet_uri/shapenet/raw/02691156/1021a0914a7207aff927ed529ad90a11.txt test_inf.txt

Inspect the file

In [None]:
!head test_inf.txt

### Test Inference Container

You can test the inference container by running an interactive docker container on port 8080 and attaching the trained model.

```
$ docker run --rm -p 8080:8080 --mount src="$(pwd)/model",target="/opt/ml/model",type=bind sagemaker-torchpoints3d-inference:latest serve
```

Once your model is running you can, check the ping response:

```
$ curl localhost:8080/ping
{
  "status": "Healthy"
}
```

Then post a request to the invocation endpoint with the sample file

```
$ curl -X POST http://localhost:9090/predictions/model -T test_inf.txt
```


## Deploy model

Deploy the model for real-time inference

In [None]:
from time import gmtime, strftime

sm_client = boto3.client(service_name='sagemaker')



In [None]:
from sagemaker.utils import name_from_image

inference_image = '{}.dkr.ecr.{}.amazonaws.com/sagemaker-torchpoints3d-inference:latest'.format(account_id, region)

# Get a endpoint name based on the image
endpoint_name = name_from_image(inference_image)

container = {
    'Image': inference_image,
    'ModelDataUrl': estimator.model_data
}

create_model_response = sm_client.create_model(
    ModelName = endpoint_name,
    ExecutionRoleArn = role,
    Containers = [container])

print("Model Arn: " + create_model_response['ModelArn'])

In [None]:
endpoint_config_name = endpoint_name + 'EPConf'
print('Endpoint config name: ' + endpoint_config_name)

create_endpoint_config_response = sm_client.create_endpoint_config(
    EndpointConfigName = endpoint_config_name,
    ProductionVariants=[{
        'InstanceType': 'ml.c5.xlarge',
        'InitialInstanceCount': 1,
        'InitialVariantWeight': 1,
        'ModelName': endpoint_name,
        'VariantName': 'AllTraffic'}])

print("Endpoint config Arn: " + create_endpoint_config_response['EndpointConfigArn'])

In [None]:
import time

endpoint_name = endpoint_name
print('Endpoint name: ' + endpoint_name)

create_endpoint_response = sm_client.create_endpoint(
    EndpointName=endpoint_name,
    EndpointConfigName=endpoint_config_name)
print('Endpoint Arn: ' + create_endpoint_response['EndpointArn'])

resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
status = resp['EndpointStatus']
print("Endpoint Status: " + status)

print('Waiting for {} endpoint to be in service...'.format(endpoint_name))
waiter = sm_client.get_waiter('endpoint_in_service')
waiter.wait(EndpointName=endpoint_name)

In [None]:
# from sagemaker.utils import name_from_image

# inference_image = '{}.dkr.ecr.{}.amazonaws.com/sagemaker-torchpoints3d-inference:latest'.format(account_id, region)

# # Get a endpoint name based on the image
# endpoint_name = name_from_image(inference_image)

# # Deploy the endpoint
# predictor = estimator.deploy(initial_instance_count=1, 
#                              instance_type='ml.c5.xlarge',
#                              image=inference_image,
#                              endpoint_name=endpoint_name)

Load the input files into a byte array

In [None]:
filename = 'test_inf.txt'

with open(filename, 'rb') as file:
    body = file.read()
    body = bytearray(body)

In [None]:
%%time

import json

response = runtime_sm_client.invoke_endpoint(
    EndpointName=endpoint_name,
    Body=body,
    ContentType = 'application/octet-stream')

results = response['Body'].read()
len(results)

Make the call to sagemaker predictor 

In [None]:
%%time

#predictor = sagemaker.predictor.RealTimePredictor(endpoint_name)
results = predictor.predict(body)
len(results)

Or perform inference with the boto3 client

In [None]:
endpoint_name = 'sagemaker-torchpoints3d-inference-2020-07-24-04-55-44-146'

In [None]:
%%time 

import boto3

client = boto3.client('sagemaker-runtime')

response = client.invoke_endpoint(
    EndpointName= endpoint_name,
    Body= body,
    ContentType = 'application/octet-stream')

results = response['Body'].read()
len(results)

## Visualize

Load the results from prediction as numpy array and visualize with [mplot3d](https://matplotlib.org/mpl_toolkits/mplot3d/index.html) or iteratively in jupyter lab with [ipyvolume](https://ipyvolume.readthedocs.io/en/latest/install.html#for-jupyter-lab-users)

In [None]:
!pip install ipyvolume -q

In [None]:
import numpy as np
import json

data = np.array(json.loads(results)['response'])
print(data.shape)

data[0:2]

Visualize with mplot3d

In [None]:
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt

fig = plt.figure(figsize=(10,10))
ax = fig.add_subplot(111, projection='3d')

x,y,z,c = data[:,0], data[:,1], data[:,2], data[:,3]

ax.scatter(x, y, z, c=c, marker='o')

ax.set(xlim=(-0.4, 0.4), ylim=(-0.4, 0.4))
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')

ax.view_init(elev=0., azim=90)

plt.show()

Visualize point cloud in 3d ipvolume widget

In [None]:
import ipyvolume as ipv

def get_coord(data, color):
    mask = data[:,3]==color
    return data[:,0][mask], data[:,1][mask], data[:,2][mask]

fig = ipv.figure(width=600, height=600)

x,y,z = get_coord(data, 6)
scatter = ipv.scatter(x, y, z, size=1, marker='sphere', color='grey')

x,y,z = get_coord(data, 7)
scatter = ipv.scatter(x, y, z, size=1, marker='sphere', color='yellow')

ipv.xyzlim(-0.5, 0.5)
ipv.show()