# Amazon SageMaker administration and security workshop: Lab 2

This notebook contains hands-on exercises for the workshop **Amazon SageMaker administration and security** – Lab 2.

## Import packages and load variables

In [15]:
import boto3
import sagemaker
from sagemaker.network import NetworkConfig
from sagemaker.sklearn.processing import SKLearnProcessor
from sagemaker.processing import ProcessingInput, ProcessingOutput

sagemaker.__version__

'2.144.0'

In [10]:
%store -r 

%store

try:
    initialized
except NameError:
    print("++++++++++++++++++++++++++++++++++++++++++")
    print("[ERROR] YOU HAVE TO RUN 01-lab-01 notebook         ")
    print("++++++++++++++++++++++++++++++++++++++++++")

Stored variables and their in-db values:
bucket_name                   -> 'sagemaker-us-east-1-949335012047'
bucket_prefix                 -> 'from-idea-to-prod/xgboost'
domain_id                     -> 'd-dech5fdx5938'
initialized                   -> True
input_s3_url                  -> 's3://sagemaker-us-east-1-949335012047/sm-admin-wo
region                        -> 'us-east-1'
sm_role                       -> 'arn:aws:iam::949335012047:role/sagemaker-admin-wo
target_col                    -> 'y'
test_s3_url                   -> 's3://sagemaker-us-east-1-949335012047/sm-admin-wo
train_s3_url                  -> 's3://sagemaker-us-east-1-949335012047/sm-admin-wo
validation_s3_url             -> 's3://sagemaker-us-east-1-949335012047/sm-admin-wo


In [11]:
# Get some variables you need to interact with SageMaker service
boto_session = boto3.Session()
region = boto_session.region_name
bucket_name = sagemaker.Session().default_bucket()
bucket_prefix = "sm-admin-workshop/xgboost"  
sm_session = sagemaker.Session()
sm_client = boto_session.client("sagemaker")
ssm = boto3.client("ssm")
sm_role = sagemaker.get_execution_role()

## Data protection

### Enforce encryption of input data
Follow the instructions in the workshop lab 2.
Add the `Deny` inline policy to the user profile execution role using the `ebs_key_arn` you retrieve in the following code cells.

In [30]:
# Account id and region
account_id = boto3.client("sts").get_caller_identity()["Account"]
region = boto3.Session().region_name

account_id, region

('949335012047', 'us-east-1')

In [31]:
security_group_ids = ssm.get_parameter(Name=f"sagemaker-admin-workshop-{region}-{account_id}-sagemaker-sg-ids")["Parameter"]["Value"]
private_subnet_ids = ssm.get_parameter(Name=f"sagemaker-admin-workshop-{region}-{account_id}-private-subnet-ids")["Parameter"]["Value"]
ebs_key_arn = ssm.get_parameter(Name=f"sagemaker-admin-workshop-{region}-{account_id}-kms-ebs-key-arn")["Parameter"]["Value"]

security_group_ids, private_subnet_ids, ebs_key_arn

('sg-094cc28a340257059',
 'subnet-0dbfb5fab7b6ae14e,subnet-0324af06a736e9404',
 'arn:aws:kms:us-east-1:949335012047:key/11e91a97-d3d6-4089-a426-a8354a453965')

Now attach the inline policy which enforce usage of the volume KMS key with `ebs_key_arn`.

In [20]:
# Construct the NetworkConfig with the values for your environment
network_config = NetworkConfig(
        enable_network_isolation=False, 
        security_group_ids=security_group_ids.split(','),
        subnets=private_subnet_ids.split(','),
        encrypt_inter_container_traffic=True)

In [21]:
framework_version = "0.23-1"
processing_instance_type = "ml.m5.large"
processing_instance_count = 1

In [22]:
# Define processing inputs and outputs
processing_inputs = [
        ProcessingInput(
            source=input_s3_url, 
            destination="/opt/ml/processing/input",
            s3_input_mode="File",
            s3_data_distribution_type="ShardedByS3Key"
        )
]

processing_outputs = [
        ProcessingOutput(
            output_name="train_data", 
            source="/opt/ml/processing/output/train",
            destination=train_s3_url,
        ),
        ProcessingOutput(
            output_name="validation_data", 
            source="/opt/ml/processing/output/validation", 
            destination=validation_s3_url
        ),
        ProcessingOutput(
            output_name="test_data", 
            source="/opt/ml/processing/output/test", 
            destination=test_s3_url
        ),
]

Create a processor without volume encryption.

In [32]:
# Create a processor
sklearn_processor = SKLearnProcessor(
    framework_version=framework_version,
    role=sm_role,
    instance_type=processing_instance_type,
    instance_count=processing_instance_count, 
    base_job_name='sm-admin-workshop-processing',
    sagemaker_session=sm_session,
    network_config=network_config,
#    volume_kms_key = ebs_key_arn
)

INFO:sagemaker.image_uris:Defaulting to only available Python version: py3


This run will fail because the execution role policy requires usage of the designated volume key, which wasn't provided.

In [None]:
# Start the processing job
sklearn_processor.run(
        inputs=processing_inputs,
        outputs=processing_outputs,
        code='preprocessing.py',
        wait=True,
)

Create a new processor with the intended value of `volume_kms_key` and run the processing job.

In [35]:
# Create a processor
sklearn_processor = SKLearnProcessor(
    framework_version=framework_version,
    role=sm_role,
    instance_type=processing_instance_type,
    instance_count=processing_instance_count, 
    base_job_name='sm-admin-workshop-processing',
    sagemaker_session=sm_session,
    network_config=network_config,
    volume_kms_key = ebs_key_arn
)

INFO:sagemaker.image_uris:Defaulting to only available Python version: py3


In [None]:
# This call wil succeed and the processing job will finish
sklearn_processor.run(
        inputs=processing_inputs,
        outputs=processing_outputs,
        code='preprocessing.py',
        wait=True,
)

## Data access control

## Data perimeter

In [36]:
bucket_name = sagemaker.Session().default_bucket()
bucket_name

'sagemaker-us-east-1-949335012047'

In [39]:
s3_vpc_id = ssm.get_parameter(Name=f"sagemaker-admin-workshop-{region}-{account_id}-s3-vpce-id")["Parameter"]["Value"]

s3_vpc_id

'vpce-09ea45d1cda545dbc'

In [49]:
!aws s3 ls s3://sagemaker-us-east-1-949335012047

                           PRE from-idea-to-prod-processing-2023-04-10-12-10-14-429/
                           PRE from-idea-to-prod-processing-2023-04-10-12-11-21-460/
                           PRE from-idea-to-prod-processing-2023-04-10-12-13-11-784/
                           PRE from-idea-to-prod-processing-2023-04-10-12-14-05-826/
                           PRE from-idea-to-prod-processing-2023-04-10-12-15-04-891/
                           PRE from-idea-to-prod-processing-2023-04-10-12-49-20-993/
                           PRE from-idea-to-prod-processing-2023-04-10-12-50-33-331/
                           PRE from-idea-to-prod-processing-2023-04-10-12-52-12-199/
                           PRE from-idea-to-prod-processing-2023-04-10-12-52-20-600/
                           PRE from-idea-to-prod-processing-2023-04-10-12-52-26-829/
                           PRE from-idea-to-prod-processing-2023-04-10-12-58-04-089/
                           PRE from-idea-to-prod-processing-2023-

In [51]:
!aws s3 ls s3://sagemaker-eu-central-1-949335012047

                           PRE PreprocessData-e5d4c2b08e616c264c9dc5053871519a/
                           PRE from-idea-to-prod-processing-2022-09-28-10-08-57-912/


## Resource isolation using tags

## Shutdown kernel

In [None]:
%%html

<p><b>Shutting down your kernel for this notebook to release resources.</b></p>
<button class="sm-command-button" data-commandlinker-command="kernelmenu:shutdown" style="display:none;">Shutdown Kernel</button>
        
<script>
try {
    els = document.getElementsByClassName("sm-command-button");
    els[0].click();
}
catch(err) {
    // NoOp
}    
</script>