## Amazon SageMaker DICOM Training Overview

In this example we will demonstrate how to integrate the [MONAI](http://monai.io) framework into Amazon SageMaker, use SageMaker Ground Truth labelled data, and give example code of MONAI pre-processing transforms and neural network (DenseNet) that you can use to train a medical image classification model using DICOM images directly.  Please also visit [Build a medical image analysis pipeline on Amazon SageMaker using the MONAI framework](https://aws.amazon.com/blogs/industries/build-a-medical-image-analysis-pipeline-on-amazon-sagemaker-using-the-monai-framework/) for additional details on how to deploy the MONAI model, pipe input data from S3, and perform batch inferences using SageMaker batch transform.

For more information about the PyTorch in SageMaker, please visit [sagemaker-pytorch-containers](https://github.com/aws/sagemaker-pytorch-containers) and [sagemaker-python-sdk](https://github.com/aws/sagemaker-python-sdk) github repositories.

### Note: this notebook use BYOS and BYOC for deployment, please refer to [pytoch-inference-hander](https://github.com/aws/sagemaker-pytorch-inference-toolkit/blob/master/src/sagemaker_pytorch_serving_container/default_pytorch_inference_handler.py) for details about the default deployment. 

Sample dataset is obtained from this [source COVID-CT-MD](https://github.com/ShahinSHH/COVID-CT-MD). The dataset contains volumetric chest CT scans (DICOM files) of 169 patients positive for COVID-19 infection, 60 patients with CAP (Community Acquired Pneumonia), and 76 normal patients. 

refer to documentation of Sagemaker [Predictor](https://sagemaker.readthedocs.io/en/latest/predictors.html#)

### Install necessary packages 
Packages to be installed: 
+ monai, monai will upgrate torch to latest version.  Thus, we need to update torchvision in the following step
+ itk
+ pillow
+ pandas 

In [None]:
!pip install -r ./source/requirements.txt

In [None]:
!pip install --upgrade torch torchvision  ## upgrade torchvision to ensure consistent performance

In [1]:
import torch, torchvision
print(torch.__version__)
print(torchvision.__version__)

1.10.1+cu102
0.11.2+cu102


## Import libaries

In [8]:
from sagemaker.predictor import Predictor
from monai.transforms import Compose, LoadImage, Resize, ScaleIntensity, ToTensor, SqueezeDim
from sagemaker.serializers import NumpySerializer, JSONSerializer
from sagemaker.deserializers import NumpyDeserializer, JSONDeserializer

import os
from sagemaker.s3 import S3Downloader, S3Uploader


## Model deployment

Here,we will demonstrate 3 different ways to deploy a model in SageMaker


#### Method 1: directly deploy a model if it is trained in the same notebook


  *estimator.deploy(param)*
  
In this method, it will deploy the model with default inference handler, refer to the [pytoch-inference-hander](https://github.com/aws/sagemaker-pytorch-inference-toolkit/blob/master/src/sagemaker_pytorch_serving_container/default_pytorch_inference_handler.py)  for details with serializer and deserializer as numpy.

In order to correctly do inference from the endpoint, we have to use the same serializer and deserializer 

*serializer=NumpySerializer(),deserializer=NumpyDeserializer()*

In [None]:
predictor = estimator.deploy(initial_instance_count=1, instance_type='ml.m5.xlarge')

### Method 2: deploy a model with customerized processing

assuming that you have already trained model in the previous step, we can deploy an endpoint use the same API as estimator.deploy with additional parameters:

    1.entry_point 
    
    2.source_dir
    
    3.serializer
    
    4.deserializer





In [None]:
predictor2 = estimator.deploy(initial_instance_count=1, instance_type='ml.m5.xlarge',entry_point='inference.py',source_dir='source',
                            serializer=sagemaker.serializers.JSONSerializer(),
    deserializer=sagemaker.deserializers.NumpyDeserializer())

### Method 3: deploy a model with artifact saved in S3 

This is the most common way if we want to refer to previous artifact.
How to find the model artifact:

+ SageMaker console --> Training --> Training Jobs --> find the right training job --> find the S3 path of model artifact

<img src="model artifact.png"
     alt="Markdown Monster icon"
     style="float:  margin-right: 20px;margin-left: 40px;" />

**Note**: Be very carefull about the content inside *inference.py* file.


Before doing model prediction, it is very very important to run **model.eval()**! Otherwise, the inferred results is wrong!

In [None]:
model_data="s3://sagemaker-us-east-1-707754867495/pytorch-training-2022-03-03-10-17-12-889/model.tar.gz" ## model artifact from S3
model = PyTorchModel(
    entry_point="inference.py", ## inference code with customerization
    source_dir="source",        ## folder with the inference code
    role=role,
    model_data=model_data,
    framework_version="1.5.0",
    py_version="py3",
)


predictor3 = model.deploy(initial_instance_count = 1, instance_type = 'ml.m4.xlarge',serializer=sagemaker.serializers.JSONSerializer(),deserializer=sagemaker.deserializers.JSONDeserializer())

## Model Inference
I want to demonstrate 3 ways to do inference
1. inference locally:  i will download the model to EBS and show how we can do inference
2. inference with existing endpoint, endpoint 1 with default handler
3. inference with existing endpoint, predictor3 with customerized hander 

In [1]:
## define dataloader
# valX: list of local DICOM file
# valY: list of labels

from source.monai_dicom import DICOMDataset
import torch

def get_val_data_loader(valX, valY):
    val_transforms = Compose([
    LoadImage(image_only=True),
    ScaleIntensity(),
    Resize(spatial_size=(512,-1)),
    ToTensor()
    ])
    
    dataset = DICOMDataset(valX, valY, val_transforms)
    return torch.utils.data.DataLoader(dataset, batch_size=1, num_workers=1)

In [21]:
inf_test=os.listdir('test')
inf_test=['test/' + s for s in inf_test]
print(inf_test)
    
inf_test_label=['normal']
val_loader= get_val_data_loader(inf_test, inf_test_label) ## local data is prepared
class_names = [ "Normal","Cap", "Covid",] ## class name corresponding to model outputs in 0,1,2 as class



['test/normal-IM0062.dcm']


###  Inference with artifact downloaded locally


In [None]:
#### Download the model here

In [11]:
Model_S3 = 's3://sagemaker-us-east-1-707754867495/pytorch-training-2022-03-03-08-58-19-414/output/model.tar.gz'
S3Downloader.download(Model_S3, local_path='artifact2') ## download model.tar.gz to local subFolder artifact2
os.chdir('artifact2')

In [14]:
!tar -xzvf ./model.tar.gz #unzip the model artifact

model.pth


In [15]:
os.chdir('..')
os.getcwd()

'/home/ec2-user/SageMaker/repo'

In [17]:
##load the models 
from monai.networks.nets import densenet121
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = densenet121(
    spatial_dims=2,
    in_channels=1,
    out_channels=3
)
model_dir='./artifact'

with open(os.path.join(model_dir, 'model.pth'), 'rb') as f:
    model.load_state_dict(torch.load(f))
model.eval()

DenseNet121(
  (features): Sequential(
    (conv0): Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (norm0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu0): ReLU(inplace=True)
    (pool0): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (denseblock1): _DenseBlock(
      (denselayer1): _DenseLayer(
        (layers): Sequential(
          (norm1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (relu1): ReLU(inplace=True)
          (conv1): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (relu2): ReLU(inplace=True)
          (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        )
      )
      (denselayer2): _DenseLayer(
        (layers): Sequential(
          (norm1): BatchN

### Model 2 refers to the deployed endpoint with default settings, with NumpySerializer and NumpyDeserializer

In [25]:
predictor=Predictor(endpoint_name='pytorch-training-2022-03-04-10-13-17-022', serializer=NumpySerializer(),deserializer=NumpyDeserializer())

In [26]:
for i, val_data in enumerate(val_loader):
    print('actual class is ', val_data[1])
    
    inputs = val_data[0].permute(0,3, 2, 1)
    inputs=inputs.to(device)
    
    # local results
    response = model(inputs) ## after import model parameters
    print("------------MODEL 1 --------------")
    print("response:",response)
    
    pred = torch.nn.functional.softmax(torch.tensor(response), dim=1) ## Applies a softmax function.
    top_p, top_class = torch.topk(pred, 1)
    actual_label = val_data[1]
    
    
    print("predicted probability: ", pred)
    print('predicted class: '+class_names[top_class])
    print('predicted class probablity: '+str(round(top_p.item(),2)))
    
    print("------------MODEL 2 --------------")

    response2=predictor.predict(inputs)
    print('response2 is ',response2)
    pred = torch.nn.functional.softmax(torch.tensor(response2), dim=1) ## Applies a softmax function.

    top_p, top_class = torch.topk(pred, 1)
    actual_label = val_data[1]
    
    
    print("predicted probability: ", pred)
    print('predicted class: '+class_names[top_class])
    print('predicted class probablity: '+str(round(top_p.item(),2)))
    
    

actual class is  ('normal',)
------------MODEL 1 --------------
response: tensor([[ 1.8583, -1.0010, -1.0331]], grad_fn=<AddmmBackward0>)
predicted probability:  tensor([[0.8986, 0.0515, 0.0499]])
predicted class: Normal
predicted class probablity: 0.9
------------MODEL 2 --------------


To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).


response2 is  [[ 1.68081331 -0.9765076  -0.84340179]]
predicted probability:  tensor([[0.8694, 0.0610, 0.0697]], dtype=torch.float64)
predicted class: Normal
predicted class probablity: 0.87


### inference with customerized endpoint, with json payload that points to the DICOM file in S3

In [29]:
predictor2=Predictor(endpoint_name='pytorch-inference-2022-03-04-02-55-59-352', serializer=JSONSerializer(),deserializer=JSONDeserializer())

In [30]:
%%time
payload={"bucket": "dataset-pathology",
    "key":"test_data/normal-IM0062.dcm"} #other dcm files to consider: 'covid-IM0073.dcm','normal-IM0062.dcm'

predictor2.predict(payload)

CPU times: user 10.8 ms, sys: 0 ns, total: 10.8 ms
Wall time: 3.17 s


{'results': {'class': 'Normal', 'probability': 0.87}}

In [139]:
modelxx = densenet121(
    spatial_dims=2,
    in_channels=1,
    out_channels=3
)
model_dir=''
with open('model.pth', 'rb') as f:
    modelxx.load_state_dict(torch.load(f))
    modelxx.jit.