# Build custom SageMaker image for geospatial processing

The [SageMaker Distribution](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-distribution.html) is a pre-built container for Studio JupyterLab apps that includes:
- Deep learning frameworks (PyTorch, TensorFlow, Keras) for a GPU or CPU distribution type
- Python ML packages (numpy, scikit-learn, pandas)
- JupyterLab IDE
  
All components are compatible and up-to-date. The SageMaker Distribution enables ML practitioners to get started quickly with their ML development in SageMaker AI Studio. 

If you need functionality that is different than what's provided by SageMaker Distribution, you can bring your own image with your custom extensions and packages. The custom images based on the SageMaker Distribution can work both as interactive notebooks in SageMaker AI Studio and for SageMaker AI jobs like processing or training jobs, enabling smooth transitions from development in a notebook to production.

This notebook demonstrate how to create a custom SageMaker image for Studio notebooks with specific geospatial processing libraries.

See the  [Create custom images for geospatial analysis with Amazon SageMaker Distribution in Amazon SageMaker Studio](https://aws.amazon.com/blogs/machine-learning/create-custom-images-for-geospatial-analysis-with-amazon-sagemaker-distribution-in-amazon-sagemaker-studio/) blog post for detailed overview of a custom image creation requirements and workflow.

## Setup environment

In [None]:
import os
import json
import boto3
import sagemaker
from packaging import version
from IPython.display import HTML

In [None]:
sm_client = boto3.client('sagemaker')

In [None]:
NOTEBOOK_METADATA_FILE = "/opt/ml/metadata/resource-metadata.json"
domain_id = None

if os.path.exists(NOTEBOOK_METADATA_FILE):
    with open(NOTEBOOK_METADATA_FILE, "rb") as f:
        metadata = json.loads(f.read())
        domain_id = metadata.get('DomainId')
        space_name = metadata.get('SpaceName')
        print(f"SageMaker domain id: {domain_id}")

if not space_name:
    raise Exception(f"Cannot find the current space name. Make sure you run this notebook in a JupyterLab in the SageMaker Studio")
else:
    print(f"Space name: {space_name}")
    
r = sm_client.describe_space(DomainId=domain_id, SpaceName=space_name)
user_profile_name = r['OwnershipSettings']['OwnerUserProfileName']

assert(user_profile_name)
print(f"User profile: {user_profile_name}")

%store domain_id
%store space_name
%store user_profile_name

In [None]:
role = sagemaker.get_execution_role()

### Check if docker access is enabled in the domain

In [None]:
# check that docker enabled in the SageMaker domain
docker_settings = sm_client.describe_domain(DomainId=domain_id)['DomainSettings'].get('DockerSettings')
docker_enabled = False

if docker_settings:
    if docker_settings.get('EnableDockerAccess') in ['ENABLED']:
        print(f"The docker access is ENABLED in the domain {domain_id}")
        docker_enabled = True

if not docker_enabled:
    raise Exception(f"You must enable docker access in the domain to use Studio local mode")

<div style="border: 4px solid coral; text-align: center; margin: auto;">
If the previous code cell raised an exeption that the docker access is not enabled, you need to enable the access. See the following instructions how to do it.
</div>

In [None]:
print(f"Domain id: {domain_id}")

### Enable docker access for the SageMaker domain

<div class="alert alert-info">You only need this section if the docker access is not enabled in the domain.
</div>

To update domain settings, you can use **one** of the following options.

#### Option 1: run `update_domain` in the notebook
You need `sagemaker:UpdateDomain` permission in the execution role to run the following code in the notebook.
If you have the corresponding permissions, run the following code in a notebook:

```python
import boto3

r = boto3.client('sagemaker').update_domain(
    DomainId=domain_id,
    DomainSettingsForUpdate={
        'DockerSettings': {
            'EnableDockerAccess':'ENABLED',
        }
    }
)
```

#### Option 2: run `aws sagemaker` CLI in the  terminal
Make sure you run `AWS CLI` in the terminal where you have the corresponding permission `sagemaker:UpdateDomain`. 
Run the following command:

```
aws sagemaker update-domain --domain-id <DOMAIN-ID> --domain-settings-for-update DockerSettings={EnableDockerAccess='ENABLED'}
```

For example, you can run the command above in the [AWS CloudShell](https://aws.amazon.com/blogs/aws/aws-cloudshell-command-line-access-to-aws-resources/) in your AWS account.

### Install Docker

In [None]:
%%bash

# see https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository
sudo apt-get update
sudo apt-get install -y ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc

# Add the repository to Apt sources:
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update

## Currently only Docker version 20.10.X is supported in Studio: see https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-local.html
# pick the latest patch from:
# apt-cache madison docker-ce | awk '{ print $3 }' | grep -i 20.10
VERSION_STRING=5:20.10.24~3-0~ubuntu-jammy
sudo apt-get install docker-ce-cli=$VERSION_STRING docker-compose-plugin -y

# validate the Docker Client is able to access Docker Server at [unix:///docker/proxy.sock]
docker version

## Get the current SageMaker Image

The custom image is based on the current SageMaker image. See the [Amazon SageMaker Distribution GitHub](https://github.com/aws/sagemaker-distribution/) for the full list of SageMaker distribution images.

In [None]:
# Retrieve the SageMaker image for the current space, extract the type (cpu/gpu) and the version
try:
    r = sm_client.describe_space(DomainId=domain_id, SpaceName=space_name)
    resource_spec = r['SpaceSettings']['JupyterLabAppSettings']['DefaultResourceSpec']
    sm_image = resource_spec['SageMakerImageArn']
    sm_dist_type = sm_image.split('-')[-1]
    sm_image_version = version.parse(resource_spec['SageMakerImageVersionAlias'])
    sm_image_version = f'{sm_image_version.major}.{sm_image_version.minor}'
except KeyError as e:
    print(f'Cannot find the key {e} in the DescribeSpace output. Make sure you run this notebook in a built-in SageMaker Distribution Image, not in the custom image')
    raise e

print(f"""
SageMaker image: \033[1m{sm_image}\033[0m
SageMaker image type: \033[1m{sm_dist_type}\033[0m
SageMaker image version: \033[1m{sm_image_version}\033[0m
""")

In [None]:
repo_name = "smd-custom-geo"
image_name = f'{sm_image_version}-{sm_dist_type}'

account_id = boto3.client("sts").get_caller_identity()["Account"]
region = sagemaker.Session().boto_region_name
ecr_uri = f"{account_id}.dkr.ecr.{region}.amazonaws.com/{repo_name}:{image_name}"

print(f'Custom image ECR URI: {ecr_uri}')

## Build a custom Docker image

There are several approaches to build a custom SageMaker image. For any approach you need to implement the following steps:
1. Create a Dockerfile extending from SageMaker Distribution and configured with required packages, dependencies, and environments
2. Configure an Amazon ECR repository to host the images
3. Build and push the image to ECR repository
4. Attach the image to SageMaker AI Studio domain

For simplicity, this notebook builds and pushes the image using bash script in code cells. For a real-world scenario you need to implement an automated image building and testing pipeline, and use Infrastructure as Code to automate the deployment.

For more details refer to this [example](https://github.com/aws-samples/sagemaker-custom-image-for-geospatial-analytics) and [SageMaker AI documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-jl-admin-guide-custom-images.html).

In [None]:
# %%bash

# docker system prune -af 

In [None]:
# Pass the variables to the build script
os.environ['REPO_NAME'] = repo_name
os.environ['SMD_DIST_TYPE'] = sm_dist_type
os.environ['SMD_VERSION'] = sm_image_version



In [None]:
%%bash

set -e

# Region, defaults to us-east-1
REGION=$AWS_DEFAULT_REGION
REGION=${REGION:-us-east-1}

ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)

echo "REPO_NAME: $REPO_NAME"
echo "ACCOUNT_ID: $ACCOUNT_ID"
echo "REGION: $REGION"
echo "SMD_DIST_TYPE: $SMD_DIST_TYPE"
echo "SMD_VERSION: $SMD_VERSION"

TAG=${REPO_NAME}:${SMD_VERSION}-${SMD_DIST_TYPE}
ECR_URI="${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/${TAG}"

echo "IMAGE TAG: $TAG"
echo "ECR TARGET: ${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/${TAG}"

# If the repository doesn't exist in ECR, create it.
aws ecr describe-repositories --repository-names "${REPO_NAME}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${REPO_NAME}" > /dev/null
fi

# Login to ECR
aws ecr get-login-password --region ${REGION} | docker login --username AWS --password-stdin ${ECR_URI}

# Build and push the image
set -x
docker build --network sagemaker -f Dockerfile --build-arg SMD_DISTRIBUTION_TYPE=$SMD_DIST_TYPE --build-arg SMD_VERSION=$SMD_VERSION -t $TAG .
set +x

docker tag $TAG ${ECR_URI}
docker push ${ECR_URI}

echo ""
echo "Created image pushed to ECR image URI: $ECR_URI"
echo ""
echo "Done"



After a successful build and push, list images in the ECR. You must see the latest built image.

In [None]:
!aws ecr list-images --repository-name {repo_name}

## Attach image to the SageMaker domain

Having the image in the ECR repository in your account, you can now attach it to the SageMaker AI domain.

In [None]:
# Output image URI
ecr_uri

Follow the instructions:

1. Open the [SageMaker AI console](https://console.aws.amazon.com/sagemaker)
2. Select **Admin configurations** on the left pane, choose **Domains**
3. Select the domain to which you want to attach the image
4. Select the **Environment** tab
5. In the section **Custom images for personal Studio apps**, select **Attach image**
6. Select **New image** and enter the image URI to the **Enter an ECR image URI**, select **Next**
7. Enter an **Image name** and **Image display name**, for example `smd-geo` and `SageMaker Distribution Geo 1.11 CPU`
8. Select **JupyterLab image** as application type and click on **Submit**

## Create a JupyterLab app with the custom image

Follow the instructions:

In [25]:
# Show the Studio JupyterLab Apps link
display(
    HTML('<b>1. Open <a target="top" href="https://studio-{}.studio.{}.sagemaker.aws/jupyterlab">JupyterLab Spaces</a> in the Studio UI</b>'.format(
            domain_id, region))
)

2. Select **JupyterLab**
3. Select **+ Create JupyterLab space** and enter a name for your new space, for example `geo-custom`, select **Create space**
4. Select the custom image you attached to SageMaker Studio as **Image**
5. Select **Run Space**

![](../img/select-custom-image.png)

Wait until the space change status to `Running` and select **Open**. This will open a JupyterLab app in a new browser tab.

## Use the custom image

Now you can use the custom image in one of two ways:
1. Use the image to run JupyterLab notebook kernels
2. Use the image to run SageMaker training and processing jobs

For these examples **change to the new JupyterLab app with the custom image** and open [`06_smd_custom_geospatial`](../06_smd_custom_geospatial.ipynb) notebook.

---