# Start Here Notebook


This notebook is used to make sure that your setup is ready to run for the workshop.

The variables that will be created are

* `project_prefix`: Project name prefix to be used for resource naming, e.g. S3, training job, endpoint name, pipeline names, etc.
* `bucket_prefix`:  The main bucket that will be used throughout the examples.
* `mlflow_name`: The name of the Mlflow server.
* `mlflow_arn`: The resource identifier for the Mlflow server to be used for tracking experiments and runs.
* `domain_id`: The Sagemaker domain id.
* `region`: The current AWS region being used.



To run this notebook and all notebooks in the workshop please use the `Python 3` kernel in JupyterLab

## Setup
Get the latest version of SageMaker Python SDK.

<div class="alert alert-info"> 💡 The workshop and all notebooks were tested with Sagemaker Distribution `1.11` and the SageMaker Python SDK (the package sagemaker) version 2.219.0. The notebooks don't pin the version of the sagemaker. If you encounter any incompatibility issues, you can install the specific version of the sagemaker by running the pip command: <code>%pip install sagemaker=2.219.0</code>
</div>

### Import packages

In [3]:
import time
import os
import json
import boto3
import numpy as np  
import pandas as pd 
import sagemaker
from time import gmtime, strftime, sleep

(sagemaker.__version__,boto3.__version__)

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/sagemaker-user/.config/sagemaker/config.yaml


('2.219.0', '1.34.162')

### Set constants

In [22]:
# Get some variables you need to interact with SageMaker service
boto_session = boto3.Session()
sm_client = boto_session.client("sagemaker")
# sm_session = sagemaker.Session()
bucket_name = sagemaker.Session().default_bucket()

region = boto_session.region_name
project_prefix = "amzn"
bucket_prefix = f"{bucket_name}/{project_prefix}"

initialized = True

In [23]:
# Store some variables to keep the value between the notebooks

%store project_prefix
%store bucket_prefix
%store region
%store initialized

Stored 'project_prefix' (str)
Stored 'bucket_prefix' (str)
Stored 'region' (str)
Stored 'initialized' (bool)


### Get domain id
You need this value `domain_id` in many SageMaker Python SDK and boto3 SageMaker API calls. The notebook metadata file contains `domain_id` value. The following code demonstrates how to access the notebook metadata file and get the `domain_id`.

In [26]:
NOTEBOOK_METADATA_FILE = "/opt/ml/metadata/resource-metadata.json"
domain_id = None

if os.path.exists(NOTEBOOK_METADATA_FILE):
    with open(NOTEBOOK_METADATA_FILE, "rb") as f:
        metadata = json.loads(f.read())
        domain_id = metadata.get('DomainId')        

assert domain_id
%store domain_id

Stored 'domain_id' (str)


### Get MLflow server
If you're running an AWS-led workshop or used the delivered CloudFormation template to provision your workshop environment, an MLflow server must be up and running. If you don't have an MLflow server, follow the [Developer Guide](https://docs.aws.amazon.com/sagemaker/latest/dg/mlflow-create-tracking-server.html) to create one. The creation of a new MLflow server can take up to 25 minutes.

Execute the following code to the name and arn of the MLflow server.

In [38]:
def get_mlflow_server_arn():
    list_servers = boto3.client("sagemaker").list_mlflow_tracking_servers()['TrackingServerSummaries']

    assert len(list_servers)==1 # should be one MLflow server, if not make sure to set your correct one        
    mlflow_arn = list_servers[0]['TrackingServerArn']
    mlflow_name = list_servers[0]['TrackingServerName']
    print(f"MLflow Server Name: {mlflow_name}, ARN: {mlflow_arn}")    
    return mlflow_name,mlflow_arn

mlflow_name, mlflow_arn = get_mlflow_server_arn()

MLflow Server Name: mlflow-d-ctulbmd27zmn, ARN: arn:aws:sagemaker:us-east-1:164342431904:mlflow-tracking-server/mlflow-d-ctulbmd27zmn


In [39]:
%store mlflow_name
%store mlflow_arn

Stored 'mlflow_name' (str)
Stored 'mlflow_arn' (str)


### Check if docker access is enabled

In [40]:
# check that docker enabled in the SageMaker domain
docker_settings = sm_client.describe_domain(DomainId=domain_id)['DomainSettings'].get('DockerSettings')
docker_enabled = False

if docker_settings:
    if docker_settings.get('EnableDockerAccess') in ['ENABLED']:
        print(f"The docker access is ENABLED in the domain {domain_id}")
        docker_enabled = True

if not docker_enabled:
    raise Exception(f"You must enable docker access in the domain to use Studio local mode")

The docker access is ENABLED in the domain d-ctulbmd27zmn


<div style="border: 4px solid coral; text-align: center; margin: auto;">
If the previous code cell raised an exeption that the docker access is not enabled, you need to enable the access. See the [01-custom-environment-guidance.ipynb](01-custom-environment-guidance.ipynb) for instructions how to do it.
</div>

In [32]:
import boto3

REGION = boto3.session.Session().region_name
REGION = "us-east-1"
sagemaker_dist_repos = "/aws/service/sagemaker-distribution/ecr-account-id"
sm_dist_repo_account = boto3.client('ssm', region_name=REGION).get_parameter(Name=sagemaker_dist_repos)['Parameter']['Value']

SM_DIST_IMAGE=f"{sm_dist_repo_account}.dkr.ecr.{REGION}.amazonaws.com/sagemaker-distribution-prod:1.11.0-gpu"
print(f"Sagemaker distribution account for region: {REGION}: {sm_dist_repo_account}")
print(f"SM_DIST_IMAGE: {SM_DIST_IMAGE}")

Sagemaker distribution account for region: us-east-1: 885854791233
SM_DIST_IMAGE: 885854791233.dkr.ecr.us-east-1.amazonaws.com/sagemaker-distribution-prod:1.11.0-gpu


In [33]:
# check the updated settings
sm_client.describe_domain(DomainId=domain_id)['DomainSettings']

{'DockerSettings': {'EnableDockerAccess': 'ENABLED',
  'VpcOnlyTrustedAccounts': ['885854791233']}}

### Install Docker

In [34]:
%%bash

# see https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository
sudo apt-get update
sudo apt-get install -y ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc

# Add the repository to Apt sources:
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update

## Currently only Docker version 20.10.X is supported in Studio: see https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-local.html
# pick the latest patch from:
# apt-cache madison docker-ce | awk '{ print $3 }' | grep -i 20.10
VERSION_STRING=5:20.10.24~3-0~ubuntu-jammy
sudo apt-get install docker-ce-cli=$VERSION_STRING docker-compose-plugin -y

# validate the Docker Client is able to access Docker Server at [unix:///docker/proxy.sock]
docker version

Hit:1 http://archive.ubuntu.com/ubuntu jammy InRelease
Get:2 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]
Get:3 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
Get:4 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [127 kB]
Get:5 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages [2372 kB]
Get:6 http://archive.ubuntu.com/ubuntu jammy-updates/restricted amd64 Packages [3278 kB]
Get:7 http://security.ubuntu.com/ubuntu jammy-security/restricted amd64 Packages [3200 kB]
Get:8 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 Packages [1451 kB]
Get:9 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages [2648 kB]
Get:10 http://security.ubuntu.com/ubuntu jammy-security/universe amd64 Packages [1162 kB]
Get:11 http://archive.ubuntu.com/ubuntu jammy-backports/main amd64 Packages [81.4 kB]
Get:12 http://archive.ubuntu.com/ubuntu jammy-backports/universe amd64 Packages [33.7 kB]
Fetched 14.6 MB in 2s (8

debconf: delaying package configuration, since apt-utils is not installed


Fetched 1347 kB in 1s (1821 kB/s)
Selecting previously unselected package openssl.
(Reading database ... 13790 files and directories currently installed.)
Preparing to unpack .../openssl_3.0.2-0ubuntu1.18_amd64.deb ...
Unpacking openssl (3.0.2-0ubuntu1.18) ...
Selecting previously unselected package ca-certificates.
Preparing to unpack .../ca-certificates_20240203~22.04.1_all.deb ...
Unpacking ca-certificates (20240203~22.04.1) ...
Setting up openssl (3.0.2-0ubuntu1.18) ...
Setting up ca-certificates (20240203~22.04.1) ...
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 78.)
debconf: falling back to frontend: Readline
Updating certificates in /etc/ssl/certs...
146 added, 0 removed; done.
Processing triggers for ca-certificates (20240203~22.04.1) ...
Updating certificates in /etc/ssl/certs...
0 added, 0 removed; done.
Running hooks in /etc

debconf: delaying package configuration, since apt-utils is not installed


Fetched 55.8 MB in 1s (97.6 MB/s)
Selecting previously unselected package docker-ce-cli.
(Reading database ... 14266 files and directories currently installed.)
Preparing to unpack .../docker-ce-cli_5%3a20.10.24~3-0~ubuntu-jammy_amd64.deb ...
Unpacking docker-ce-cli (5:20.10.24~3-0~ubuntu-jammy) ...
Selecting previously unselected package docker-compose-plugin.
Preparing to unpack .../docker-compose-plugin_2.29.7-1~ubuntu.22.04~jammy_amd64.deb ...
Unpacking docker-compose-plugin (2.29.7-1~ubuntu.22.04~jammy) ...
Setting up docker-compose-plugin (2.29.7-1~ubuntu.22.04~jammy) ...
Setting up docker-ce-cli (5:20.10.24~3-0~ubuntu-jammy) ...
Client: Docker Engine - Community
 Version:           20.10.24
 API version:       1.41
 Go version:        go1.19.7
 Git commit:        297e128
 Built:             Tue Apr  4 18:21:03 2023
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          25.0.6
  API version:      1.44 (minimum ve

## Restart kernel

In [35]:
# Restart kernel to get the packages
import IPython
IPython.Application.instance().kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}

## Further workshop flow
Continue with the workshop by going to Module 01.
