# CheXpert : A Large Chest X-Ray Dataset and Competition

This competition launched by the Stanford ML group aims at finding a prediction model which could perform as well as radiologist to find different pathologies thanks to chest X-Ray. The Dataset available to train our model is composed of 223,414 chest radiographs of 65,240 patients.

<img src="view1_frontal.jpg" title="X-Ray image of the dataset" width = 320/>

The website of the competition:
https://stanfordmlgroup.github.io/competitions/chexpert/

[Publication](https://arxiv.org/pdf/1901.07031.pdf) : Irvin, Jeremy, et al. "CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison." arXiv preprint arXiv:1901.07031 (2019).

Our goal is first to reproduce main results obtained in the related paper, published in January 2019.

In [3]:
!pip install -qU torchvision

[0m

In [4]:
!pip install pillow

[0m

In [5]:
!pip install requests

[0m

## Contents

1. [Background](#Background)
1. [Setup](#Setup)
1. [Data](#Data)
1. [Train](#Train)
1. [Host](#Host)

---

## Background

we used this open source https://www.kaggle.com/code/dnik007/pneumonia-detection-using-pytorch notebook.

For more information about the PyTorch in SageMaker, please visit [sagemaker-pytorch-containers](https://github.com/aws/sagemaker-pytorch-containers) and [sagemaker-python-sdk](https://github.com/aws/sagemaker-python-sdk) github repositories.

---

## Setup

_This notebook was created and tested on an ml.m4.xlarge notebook instance._

Let's start by creating a SageMaker session and specifying:

- The S3 bucket and prefix that you want to use for training and model data.  This should be within the same region as the Notebook Instance, training, and hosting.
- The IAM role arn used to give training and hosting access to your data. See the documentation for how to create these.  Note, if more than one role is required for notebook instances, training, and/or hosting, please replace the `sagemaker.get_execution_role()` with a the appropriate full IAM role arn string(s).


In [4]:
import sagemaker

sagemaker_session = sagemaker.Session()

bucket = sagemaker_session.default_bucket()

role = sagemaker.get_execution_role()

In [5]:
import json
metadata = json.load(open("/opt/ml/metadata/resource-metadata.json","r"))
bucket = metadata["UserProfileName"]
bucket

'7f1569a2-45b7-47fa-9c63-1f81fe1ac06f'

In [6]:
bucket

'7f1569a2-45b7-47fa-9c63-1f81fe1ac06f'

In [7]:
from boto3 import client

conn = client('s3')  # again assumes boto.cfg setup, assume AWS S3
for key in conn.list_objects(Bucket=bucket)['Contents']:
    print(key['Key'])

data/0026358a-7ad6-40a2-8705-f0f3610e9a8f.dcm
data/00f9eb96-0eac-4485-9273-edab2df09ec5.dcm
data/03f1f601-95a6-4e1f-b9b3-8605e60b2425.dcm
data/042818b7-04f4-41e1-bec9-9f0e5396adef.dcm
data/04c00891-1710-4ea3-acad-6001a287f73c.dcm
data/04ed79a6-ac5f-4dc1-8833-86a238d8ba8f.dcm
data/05e1f3b7-752b-4c7d-8114-7d7dbb50f89a.dcm
data/06cbcff5-f280-4782-87bf-fc050b9ff4d6.dcm
data/07857d71-5362-4c39-97c0-e8a27be4ac90.dcm
data/07c5c1f0-edf8-4ff9-9866-4796db373b45.dcm
data/08028bf1-d024-47d5-984f-fea252cf1128.dcm
data/0c37ec91-ecdc-487a-a4e1-1ed6a4ac3f81.dcm
data/0ce63f24-4224-4571-8126-7c5873693f30.dcm
data/0d242ebd-b777-434f-86c9-b2050a594d31.dcm
data/0ef30c30-446f-42d4-a35a-17417c054f8f.dcm
data/0f433a3e-093e-4818-bb04-6e8a16de0b43.dcm
data/0ffa8cdf-751a-4257-a888-755945459246.dcm
data/10979c8c-131d-478d-80f6-97985ce21c0b.dcm
data/110e4b83-6a38-443c-9012-967e09743483.dcm
data/11ef2bb6-aa69-495d-81d7-e42f6d325c99.dcm
data/12179788-aa34-454b-bbb1-fa512015c13a.dcm
data/12eabc77-768a-44c5-af33-19c89

In [8]:
# Downloading Data from S3

import boto3
import botocore
from boto3 import client
import os
os.mkdir("data")
os.mkdir("labels")

BUCKET_NAME = bucket
c = 0
conn = client('s3') 
for key in conn.list_objects(Bucket=bucket)['Contents']:
    print(key['Key'])
    KEY = key['Key']
    s3 = boto3.resource('s3')
    try:
        s3.Bucket(BUCKET_NAME).download_file(KEY, f"{key['Key']}")
    except botocore.exceptions.ClientError as e:
        if e.response['Error']['Code'] == "404":
            print("The object does not exist.")
        else:
            raise
    c = c+1

FileExistsError: [Errno 17] File exists: 'data'

In [11]:

# Dcm to Png
!pip install pydicom
import pydicom as dicom
import pydicom
import json
import os
from pathlib import Path
from pydicom.pixel_data_handlers.util import apply_voi_lut
from pydicom import dcmread
import os
import pickle
from PIL import Image
import numpy as np
from PIL import Image
import PIL

def read_xray(path, voi_lut=True, fix_monochrome=True):
    try:
        print("Converting to PNG .........................")
        dicom = dcmread(path, force=True)
        print(dicom.SOPInstanceUID, ">>>>>>", dicom.StudyInstanceUID, ">>>>>", dicom.SeriesInstanceUID)
        #if voi_lut:
        if voi_lut and len(dicom.get("VOILUTSequence", [])):
            data = apply_voi_lut(dicom.pixel_array, dicom)
        else:
            data = dicom.pixel_array
        if fix_monochrome and dicom.PhotometricInterpretation == "MONOCHROME1":
            data = np.amax(data) - data
        data = data - np.min(data)
        data = data / np.max(data)
        data = (data * 255).astype(np.uint8)
        return data,dicom.PatientName
    except Exception as e:
        print(e, ">>>>>>>>>>>>>>>>>>>>>>>>>>>>>")
        return "corrupt"

Collecting pydicom
  Downloading pydicom-2.4.1-py3-none-any.whl (1.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m20.1 MB/s[0m eta [36m0:00:00[0m:00:01[0m
[?25hInstalling collected packages: pydicom
Successfully installed pydicom-2.4.1
[0m

In [12]:
# convert dicoms to png and store in training_data_png

from glob import glob
files_ = glob('data/*.dcm',recursive=True)
try:
    os.mkdir("training_data_png")
except:
    pass
print(files_)
c = 0
for i in files_:
    try:
        print("\n>>> ",i)
        img,patientname = read_xray(i)
        with open("filename.pkl", 'wb') as f:
            pickle.dump(img, f)
        ims = pickle.load(open("filename.pkl", "rb"))
        norm = (ims.astype(np.float) - ims.min()) * 255.0 / (ims.max() - ims.min())
        filename = str(i).split("/")[1].split("_")[0]
        print("\n>> filename",filename)
        Image.fromarray(norm.astype(np.uint8)).save(f"training_data_png/{filename}.png")
        c = c + 1
        print(c, "Done")
    except Exception as e:
        print(e)

['data/220395be-67bf-4352-9165-7882eb8a6f01.dcm', 'data/25fcfc3d-faf5-420c-bcaa-efe3c35a0027.dcm', 'data/cde17749-4a0f-4624-a555-289ee6820ade.dcm', 'data/dc042ef0-79da-4cfc-bfa9-e7ddfb74f761.dcm', 'data/9807e395-2211-40c3-9885-31c365504e65.dcm', 'data/79359da4-3e69-47d4-8221-0c237bfafdaf.dcm', 'data/23c3ed7a-5da6-4b62-ae23-e39c58348957.dcm', 'data/9abd652b-33cb-4131-a9bf-9b6442c0201d.dcm', 'data/f3dfab78-3666-4e54-bfe2-7b5e9acf1521.dcm', 'data/528ef9e9-9b34-4d8e-857b-8734f94850f0.dcm', 'data/7836d5bc-617d-4a4b-9592-7bd96af6d476.dcm', 'data/d37d05f9-8979-466a-beeb-1ca9fb38b4d9.dcm', 'data/2a2b3645-0860-4b0e-a6eb-51b89e2a12a5.dcm', 'data/c1870bd4-9e80-4db7-a9ad-9565b76613b3.dcm', 'data/39f9d924-dd5e-4b6f-93d4-fdd6f593080a.dcm', 'data/b465ca93-de94-4218-88e3-0a8ad12f793b.dcm', 'data/c765b546-68b4-4221-827d-ad51d72845c6.dcm', 'data/74049f1b-5826-4bae-af52-abc542fd3dd4.dcm', 'data/c926e886-d7ac-4eca-9ba7-90b4fa510835.dcm', 'data/bdc7e41f-3b1d-426b-91a2-e6d1547f6dad.dcm', 'data/5613c36b-b003

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations



>> filename 220395be-67bf-4352-9165-7882eb8a6f01.dcm
1 Done

>>>  data/25fcfc3d-faf5-420c-bcaa-efe3c35a0027.dcm
Converting to PNG .........................
2.43.812.8930.67771.170953.6827875.86772537978690104226974692459 >>>>>> 9.21.880.7257.13475.143829.5741880.39160383226915128853849442907 >>>>> 1.12.547.1884.57148.533368.2261967.95395407883117570671724334316

>> filename 25fcfc3d-faf5-420c-bcaa-efe3c35a0027.dcm
2 Done

>>>  data/cde17749-4a0f-4624-a555-289ee6820ade.dcm
Converting to PNG .........................
4.21.355.9222.34912.324311.8584344.81550036432378312720258838167 >>>>>> 8.24.380.7346.89430.526932.2935806.74519318306962707543314275776 >>>>> 1.17.478.2205.55414.591576.9212617.69808781944603168743602665431

>> filename cde17749-4a0f-4624-a555-289ee6820ade.dcm
3 Done

>>>  data/dc042ef0-79da-4cfc-bfa9-e7ddfb74f761.dcm
Converting to PNG .........................
5.58.691.8966.65820.108092.4464498.26236783905020340898713896204 >>>>>> 1.17.426.1379.56997.944087.1843893.934163

### Uploading the data to S3
We are going to use the `sagemaker.Session.upload_data` function to upload our datasets to an S3 location. The return value inputs identifies the location -- we will use later when we start the training job.


In [13]:
from glob import glob
import pandas as pd
labels = pd.read_csv("./labels/labels.csv",header=1)
print(labels)


   0 Case Number Patient ID Patient Name  \
0  1  Case 10375  ssss-0006    ssss-0006   
1  2  Case 10377  ssss-0008    ssss-0008   
2  3  Case 10378  ssss-0009    ssss-0009   
3  4  Case 10374  ssss-0005    ssss-0005   
4  5  Case 10373  ssss-0004    ssss-0004   
5  6  Case 10370       Anon        AA001   
6  7  Case 10371  ssss-0002    ssss-0002   
7  8  Case 10376  ssss-0007    ssss-0007   
8  9  Case 10372  ssss-0003    ssss-0003   

                                    StudyInstanceUID           User  Comments  \
0  3.73.860.9192.16202.831749.3290149.62853582098...  sudo@carpl.ai       NaN   
1  1.86.711.5694.23884.235814.2324468.17132268458...  sudo@carpl.ai       NaN   
2  6.42.185.8295.15199.978388.6046825.48050093409...  sudo@carpl.ai       NaN   
3  3.25.639.9624.60891.908848.1133013.29224464766...  sudo@carpl.ai       NaN   
4  7.39.190.7770.15802.706866.7064726.62775422404...  sudo@carpl.ai       NaN   
5  1.2.840.113619.2.203.4.2147483647.1474011852.1...  sudo@carpl.ai      

In [13]:
files_ = glob('training_data_png/*.png',recursive=True)

for file in files_:
    sagemaker_session.upload_data(path=file  , bucket=bucket, key_prefix="data/train")

## Train
### Training script
The `classifier.py` script provides all the code we need for training and hosting a SageMaker model (`model_fn` function to load a model).
The training script is very similar to a training script you might run outside of SageMaker, but you can access useful properties about the training environment through various environment variables, such as:

* `SM_MODEL_DIR`: A string representing the path to the directory to write model artifacts to.
  These artifacts are uploaded to S3 for model hosting.
* `SM_NUM_GPUS`: The number of gpus available in the current container.
* `SM_CURRENT_HOST`: The name of the current container on the container network.
* `SM_HOSTS`: JSON encoded list containing all the hosts .

Supposing one input channel, 'training', was used in the call to the PyTorch estimator's `fit()` method, the following will be set, following the format `SM_CHANNEL_[channel_name]`:

* `SM_CHANNEL_TRAINING`: A string representing the path to the directory containing data in the 'training' channel.

For more information about training environment variables, please visit [SageMaker Containers](https://github.com/aws/sagemaker-containers).

A typical training script loads data from the input channels, configures training with hyperparameters, trains a model, and saves a model to `model_dir` so that it can be hosted later. Hyperparameters are passed to your script as arguments and can be retrieved with an `argparse.ArgumentParser` instance.

Because the SageMaker imports the training script, you should put your training code in a main guard (``if __name__=='__main__':``) if you are using the same script to host your model as we do in this example, so that SageMaker does not inadvertently run your training code at the wrong point in execution.

For example, the script run by this notebook:

In [14]:
!pygmentize code/classifier.py

[34mimport[39;49;00m [04m[36margparse[39;49;00m[37m[39;49;00m
[34mimport[39;49;00m [04m[36mjson[39;49;00m[37m[39;49;00m
[34mimport[39;49;00m [04m[36mos[39;49;00m[37m[39;49;00m
[34mimport[39;49;00m [04m[36mnumpy[39;49;00m [34mas[39;49;00m [04m[36mnp[39;49;00m[37m[39;49;00m
[34mimport[39;49;00m [04m[36mtime[39;49;00m[37m[39;49;00m
[34mimport[39;49;00m [04m[36msys[39;49;00m[37m[39;49;00m
[34mimport[39;49;00m [04m[36mcsv[39;49;00m[37m[39;49;00m
[34mimport[39;49;00m [04m[36mcv2[39;49;00m[37m[39;49;00m
[34mimport[39;49;00m [04m[36mmatplotlib[39;49;00m[04m[36m.[39;49;00m[04m[36mpyplot[39;49;00m [34mas[39;49;00m [04m[36mplt[39;49;00m[37m[39;49;00m
[37m[39;49;00m
[34mimport[39;49;00m [04m[36mtorch[39;49;00m[37m[39;49;00m
[34mimport[39;49;00m [04m[36mtorch[39;49;00m[04m[36m.[39;49;00m[04m[36mnn[39;49;00m [34mas[39;49;00m [04m[36mnn[39;49;00m[37m[39;49;00m
[34mimport[39;49;00m [04m[36mtor

### Run training in SageMaker

The `PyTorch` class allows us to run our training function as a training job on SageMaker infrastructure. We need to configure it with our training script, an IAM role, the number of training instances, the training instance type, and hyperparameters. In this case we are going to run our training job on 2 ```ml.c4.xlarge``` instances. But this example can be ran on one or multiple, cpu or gpu instances ([full list of available instances](https://aws.amazon.com/sagemaker/pricing/instance-types/)). The hyperparameters parameter is a dict of values that will be passed to your training script -- you can see how to access these values in the `classifier.py` script above.


In [15]:
!pip install --upgrade sagemaker==2.152.0

Collecting sagemaker==2.152.0
  Downloading sagemaker-2.152.0.tar.gz (751 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m751.1/751.1 kB[0m [31m10.2 MB/s[0m eta [36m0:00:00[0m00:01[0m
[?25h  Preparing metadata (setup.py) ... [?25ldone
[?25hCollecting attrs<23,>=20.3.0 (from sagemaker==2.152.0)
  Downloading attrs-22.2.0-py3-none-any.whl (60 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.0/60.0 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
Collecting importlib-metadata<5.0,>=1.4.0 (from sagemaker==2.152.0)
  Downloading importlib_metadata-4.13.0-py3-none-any.whl (23 kB)
Collecting PyYAML==5.4.1 (from sagemaker==2.152.0)
  Downloading PyYAML-5.4.1-cp37-cp37m-manylinux1_x86_64.whl (636 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m636.6/636.6 kB[0m [31m10.2 MB/s[0m eta [36m0:00:00[0m00:01[0m
Collecting urllib3<1.27,>=1.25.4 (from botocore<1.30.0,>=1.29.154->boto3<2.0,>=1.26.28->sagemaker==2.152.0)
  Downloading u

In [16]:
from sagemaker.pytorch import PyTorch

In [17]:
!pip show sagemaker | grep Version

Version: 2.152.0


## PLEASE CHECK PRICING HERE : https://aws.amazon.com/sagemaker/pricing/

In [23]:
from sagemaker.pytorch import PyTorch

estimator = PyTorch(
    entry_point="classifier.py",
    role=role,
    py_version="py38",
    framework_version="1.12.0",
    instance_count=1,
    instance_type="ml.m5.large",
    # instance_type="ml.g4dn.xlarge",
    hyperparameters={"epochs": 1, "backend": "gloo"},
    dependencies=['code/requirements.txt'],
    source_dir = "code",
    max_run=86400,
)

In [24]:
estimator.fit(
    {
        "training":"s3://"+bucket+"/data/train" ,
        "testing":"s3://"+bucket+"/data/train",
        "validating":"s3://"+bucket+"/data/train"
        }
    )

INFO:sagemaker.image_uris:image_uri is not presented, retrieving image_uri based on instance_type, framework etc.


Using provided s3_resource


INFO:sagemaker.image_uris:image_uri is not presented, retrieving image_uri based on instance_type, framework etc.
INFO:sagemaker:Creating training-job with name: pytorch-training-2023-06-27-07-20-10-603


2023-06-27 07:20:12 Starting - Starting the training job...
2023-06-27 07:20:27 Starting - Preparing the instances for training......
2023-06-27 07:21:17 Downloading - Downloading input data...
2023-06-27 07:22:02 Training - Downloading the training image...
2023-06-27 07:22:27 Training - Training image download completed. Training in progress..[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2023-06-27 07:22:35,364 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2023-06-27 07:22:35,366 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2023-06-27 07:22:35,374 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2023-06-27 07:22:35,376 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[34m2023-06-27 07:22:36,032 sagemaker-training-too

## Host your model in Sagemaker
### Create endpoint
After training, we use the `PyTorch` estimator object to build and deploy a `PyTorchPredictor`. This creates a Sagemaker Endpoint -- a hosted prediction service that we can use to perform inference.

As mentioned above we have implementation of `model_fn` in the `classifier.py` script that is required. We are going to use default implementations of `input_fn`, `predict_fn`, `output_fn` and `transform_fm` defined in [sagemaker-pytorch-containers](https://github.com/aws/sagemaker-pytorch-containers).

The arguments to the deploy function allow us to set the number and type of instances that will be used for the Endpoint. These do not need to be the same as the values we used for the training job. For example, you can train a model on a set of GPU-based instances, and then deploy the Endpoint to a fleet of CPU-based instances, but you need to make sure that you return or save your model as a cpu model similar to what we did in `classifier.py`. Here we will deploy the model to a single ```ml.m4.xlarge``` instance.

In [18]:
estimator.__dict__

{'framework_version': '1.12.0',
 'py_version': 'py38',
 'instance_count': 1,
 'instance_type': 'ml.m5.large',
 'keep_alive_period_in_seconds': None,
 'instance_groups': None,
 'volume_size': 30,
 'max_run': 86400,
 'input_mode': 'File',
 'metric_definitions': None,
 'model_uri': None,
 'model_channel_name': 'model',
 'code_uri': None,
 'code_channel_name': 'code',
 'source_dir': 'code',
 'git_config': None,
 'container_log_level': 20,
 '_hyperparameters': {'epochs': 1,
  'backend': 'gloo',
  'sagemaker_submit_directory': 's3://sagemaker-ap-south-1-023180687239/pytorch-training-2023-06-27-07-00-46-005/source/sourcedir.tar.gz',
  'sagemaker_program': 'classifier.py',
  'sagemaker_container_log_level': 20,
  'sagemaker_job_name': 'pytorch-training-2023-06-27-07-00-46-005',
  'sagemaker_region': 'ap-south-1'},
 'code_location': None,
 'entry_point': 'classifier.py',
 'dependencies': ['code/requirements.txt'],
 'uploaded_code': UserCode(s3_prefix='s3://sagemaker-ap-south-1-023180687239/pyto

In [19]:
sagemaker_job_name = estimator._hyperparameters["sagemaker_job_name"]

In [20]:
predictor = estimator.deploy(initial_instance_count=1, instance_type="ml.c5.2xlarge")

INFO:sagemaker:Creating model with name: pytorch-training-2023-06-27-07-11-06-277
INFO:sagemaker:Creating endpoint-config with name pytorch-training-2023-06-27-07-11-06-277
INFO:sagemaker:Creating endpoint with name pytorch-training-2023-06-27-07-11-06-277


----!

### Evaluate

You can use the test images to evalute the endpoint. The accuracy of the model depends on how many it is trained. 

In [21]:
predictor.__dict__

{'endpoint_name': 'pytorch-training-2023-06-27-07-11-06-277',
 'sagemaker_session': <sagemaker.session.Session at 0x7f7c888c7fd0>,
 'serializer': <sagemaker.serializers.NumpySerializer at 0x7f7c8a1db190>,
 'deserializer': <sagemaker.deserializers.NumpyDeserializer at 0x7f7c8a1db1d0>,
 '_endpoint_config_name': None,
 '_model_names': None,
 '_context': None}

In [22]:
predictor.endpoint_name 

'pytorch-training-2023-06-27-07-11-06-277'

## Invoke endpoint and test your model in Sagemaker

In [101]:
import boto3

client = boto3.client('sagemaker-runtime')

custom_attributes = "c000b4f9-df62-4c85-a0bf-7c525f9104a4"  # An example of a trace ID.
endpoint_name = predictor.endpoint_name                                        # Your endpoint name.
content_type = "application/json"                                        # The MIME type of the input data in the request body.
accept = "application/json"                                              # The desired MIME type of the inference in the response.
payload = json.dumps({"url":"https://storage.googleapis.com/kaggle-datasets-images/17810/23340/c8372ebbe20b0f671c2f3c501ba51412/dataset-cover.jpeg?t=2018-03-24-19-05-18"})                                           # Payload for inference.
response = client.invoke_endpoint(
    EndpointName=endpoint_name, 
    CustomAttributes=custom_attributes, 
    ContentType=content_type,
    Accept=accept,
    Body=payload
    )

print(response)   

{'ResponseMetadata': {'RequestId': 'a22946e8-216a-4d45-9441-850fac5bd421', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'a22946e8-216a-4d45-9441-850fac5bd421', 'x-amzn-invoked-production-variant': 'AllTraffic', 'date': 'Mon, 26 Jun 2023 09:06:28 GMT', 'content-type': 'application/json', 'content-length': '4210', 'connection': 'keep-alive'}, 'RetryAttempts': 0}, 'ContentType': 'application/json', 'InvokedProductionVariant': 'AllTraffic', 'Body': <botocore.response.StreamingBody object at 0x7f215b62bad0>}


In [102]:
from pprint import pprint

In [103]:
pprint(response) 

{'Body': <botocore.response.StreamingBody object at 0x7f215b62bad0>,
 'ContentType': 'application/json',
 'InvokedProductionVariant': 'AllTraffic',
 'ResponseMetadata': {'HTTPHeaders': {'connection': 'keep-alive',
                                      'content-length': '4210',
                                      'content-type': 'application/json',
                                      'date': 'Mon, 26 Jun 2023 09:06:28 GMT',
                                      'x-amzn-invoked-production-variant': 'AllTraffic',
                                      'x-amzn-requestid': 'a22946e8-216a-4d45-9441-850fac5bd421'},
                      'HTTPStatusCode': 200,
                      'RequestId': 'a22946e8-216a-4d45-9441-850fac5bd421',
                      'RetryAttempts': 0}}


In [104]:
r = json.load(response["Body"])

In [105]:
r

{'response': {'findings': [{'name': 'No Finding', 'probability': 15.9},
   {'name': 'Enlarged Cardiomediastinum', 'probability': 6.0},
   {'name': 'Cardiomegaly', 'probability': 29.16},
   {'name': 'Lung Opacity', 'probability': 48.87},
   {'name': 'Lung Lesion', 'probability': 8.38},
   {'name': 'Edema', 'probability': 21.4},
   {'name': 'Consolidation', 'probability': 13.96},
   {'name': 'Pneumonia', 'probability': 47.53},
   {'name': 'Atelectasis', 'probability': 21.94},
   {'name': 'Pneumothorax', 'probability': 35.23},
   {'name': 'Pleural Effusion', 'probability': 28.07},
   {'name': 'Pleural Other', 'probability': 3.51},
   {'name': 'Fracture', 'probability': 8.78},
   {'name': 'Support Devices', 'probability': 3.43},
   {'name': 'ROIS', 'probability': '1'}],
  'rois': [{'finding_name': 'Abnormality',
    'type': 'Freehand',
    'points': [[1269.0, 199.0],
     [1268.0, 200.0],
     [1247.0, 200.0],
     [1246.0, 201.0],
     [1244.0, 201.0],
     [1243.0, 200.0],
     [1242.0, 

## Push code to your S3 bucket

In [52]:
import boto3

In [None]:
bucket_name = bucket
key = 'artifact/code'
from boto3 import client
s3 = client('s3')
BUCKET_NAME = bucket
DIR_NAME = 'code'
# Iterate through the files in the directory
for root, dirs, files in os.walk(DIR_NAME):
    for file in files:
        # Construct the full local path of the file
        local_path = os.path.join(root, file)
        # Construct the full S3 path of the file
        s3_path = os.path.join(root.replace(DIR_NAME, key), file)
        # Upload the file to S3
        s3.upload_file(local_path, BUCKET_NAME, s3_path)
        print(f'Uploaded {local_path} to s3://{BUCKET_NAME}/{s3_path}')

## Push model to your S3 bucket

In [None]:
modelbucket = estimator.output_path.split("/")[2]

In [None]:
try:
    os.mkdir("model")
except:
    pass
BUCKET_NAME = modelbucket
s3 = boto3.resource('s3')
s3.Bucket(BUCKET_NAME).download_file(sagemaker_job_name + "/output/model.tar.gz", "model/model.tar.gz")

In [None]:
s3 = client('s3')
BUCKET_NAME = bucket
s3.upload_file("model/model.tar.gz", BUCKET_NAME, "model/model.tar.gz")