# How to use SageMaker Processing with geospatial image

---

This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. 

![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-2/sagemaker-geospatial|processing-geospatial-ndvi|geospatial-processing-ndvi-intro.ipynb)

---

## Introduction

The following notebook shows you how to run geospatial workloads in a scale-out fashion by using SageMaker Processing with the geospatial image. In this example, Sentinel-2 data will be selected and then processed via a SageMaker Processing job to calculate the normalized difference vegetation index (NDVI).

## Overview on SageMaker Processing

The following diagram shows how Amazon SageMaker spins up a Processing job. Amazon SageMaker takes your script, copies your data from Amazon Simple Storage Service (Amazon S3), and then pulls a processing container. For geospatial custom operations, AWS provides purpose-built container with commonly used open source geospatial libraries.

The underlying infrastructure for a Processing job is fully managed by Amazon SageMaker. Cluster resources are provisioned for the duration of your job, and cleaned up when a job completes. The output of the Processing job is stored in the Amazon S3 bucket you specified.

![processing overview](https://docs.aws.amazon.com/images/sagemaker/latest/dg/images/Processing-1.png)

### Documentation on SageMaker Processing

- https://docs.aws.amazon.com/sagemaker/latest/dg/geospatial-custom-operations.html
- https://sagemaker.readthedocs.io/en/stable/api/training/processing.html

## Prerequisites

This notebook runs with the Geospatial 1.0 kernel with a `ml.geospatial.interactive` instance. Note that the following policies need to be attached to the execution role that you used to run this notebook:
- AmazonSageMakerFullAccess
- AmazonSageMakerGeospatialFullAccess

You can see the policies attached to the role in the IAM console under the permissions tab. If required, add the roles using the 'Add Permissions' button.

In addition to these policies, ensure that the execution role's trust policy allows the SageMaker-GeoSpatial service to assume the role. This can be done by adding the following trust policy using the 'Trust relationships' tab:

```
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "sagemaker.amazonaws.com",
                    "sagemaker-geospatial.amazonaws.com"
                ]
            },
            "Action": "sts:AssumeRole"
        }
    ]
}
```

## Import SageMaker geospatial capabilities SDK

In [None]:
import boto3
import sagemaker
import json
import sagemaker_geospatial_map

session = boto3.Session()
execution_role = sagemaker.get_execution_role()
geospatial_client = session.client(service_name="sagemaker-geospatial")

## Query the Sentinel-2 raster data collection using SearchRasterDataCollection

With `search_raster_data_collection` you can query supported raster data collections. This example uses data that's pulled from Sentinel-2 satellites. The area of interest (AreaOfInterest) specified is western Idaho, and the time range (TimeRangeFilter) is January 1, 2022 to December 30, 2022.

In following code examples you use the ARN associated with Sentinel-2 raster data collection, `arn:aws:sagemaker-geospatial:us-west-2:378778860802:raster-data-collection/public/nmqj48dcu3g7ayw8`.

A search_raster_data_collection API request requires two parameters:
- `Arn`: The Amazon name resource (ARN) that corresponds to the raster data collection that you want to query.
- `RasterDataCollectionQuery`: The RasterDataCollectionQuery parameter, which contains the area of interest as well as the desired time range.

In [None]:
search_params = {
    "Arn": "arn:aws:sagemaker-geospatial:us-west-2:378778860802:raster-data-collection/public/nmqj48dcu3g7ayw8",  # Sentinel-2 L2A data
    "RasterDataCollectionQuery": {
        "AreaOfInterest": {
            "AreaOfInterestGeometry": {
                "PolygonGeometry": {
                    "Coordinates": [
                        [
                            [-117.04389469702484, 43.734007425992814],
                            [-117.04389469702484, 43.70389023789181],
                            [-116.97173284570357, 43.70389023789181],
                            [-116.97173284570357, 43.734007425992814],
                            [-117.04389469702484, 43.734007425992814],
                        ]
                    ]
                }
            }
        },
        "TimeRangeFilter": {
            "StartTime": "2022-01-01T00:00:00Z",
            "EndTime": "2022-12-31T23:59:59Z",
        },
        "PropertyFilters": {
            "Properties": [{"Property": {"EoCloudCover": {"LowerBound": 0.0, "UpperBound": 2.0}}}],
            "LogicalOperator": "AND",
        },
    },
}

In [None]:
items = []
next_token = True
while next_token:
    search_result = geospatial_client.search_raster_data_collection(**search_params)
    for item in search_result["Items"]:
        items.append(item)
    next_token = search_result.get("NextToken")
    search_params["NextToken"] = next_token

print(
    "Found {} Sentinel-2 scenes for provided AOI, property filters and time range".format(
        len(items)
    )
)

## Create manifest file

When you run a processing job, you must specify a data input from Amazon S3. The input data type can either be a manifest file, which then points to the individual data files. You can also add a prefix to each file that you want processed. The following code example defines the folder where your manifest files will be generated.

In [None]:
def s2_item_to_relative_metadata_url(item):
    parts = item["Assets"]["visual"]["Href"].split("/")
    tile_prefix = parts[4:-1]
    return "{}/{}.json".format("/".join(tile_prefix), item["Id"])


manifest = [{"prefix": "s3://sentinel-cogs/sentinel-s2-l2a-cogs/"}]

for item in items:
    manifest.append(s2_item_to_relative_metadata_url(item))

In [None]:
# show first 5 entries in manifest
manifest[0:5]

In [None]:
sagemaker_session = sagemaker.Session()

s3_bucket_name = sagemaker_session.default_bucket()
s3_prefix = "processing-geospatial-ndvi-example"
s3_key_manifest = f"{s3_prefix}/manifest.json"

In [None]:
s3 = boto3.resource("s3")

s3object = s3.Object(s3_bucket_name, s3_key_manifest)
response = s3object.put(Body=(bytes(json.dumps(manifest).encode("UTF-8"))))

## Create Processing Job entry script (NDVI calculation)

Amazon SageMaker Studio supports the use of the %%writefile cell magic command. After running a cell with this command, its contents will be saved to your local Studio directory. The code below is specific to calculating NDVI but can be replaced by any business logic.

In [None]:
%%writefile compute_ndvi.py

import os
import pickle
import sys
import subprocess
import json
import rioxarray

if __name__ == "__main__":
    print("Starting processing")

    input_data_path = "/opt/ml/processing/input_data/"
    input_files = []

    for current_path, sub_dirs, files in os.walk(input_data_path):
        for file in files:
            if file.endswith(".json"):
                input_files.append(os.path.join(current_path, file))

    print("Received {} input_files: {}".format(len(input_files), input_files))

    items = []
    for input_file in input_files:
        full_file_path = os.path.join(input_data_path, input_file)
        with open(full_file_path, "r") as f:
            items.append(json.load(f))

    for item in items:
        print("Computing NDVI for {}".format(item["id"]))
        red_uri = item["assets"]["red"]["href"]
        nir_uri = item["assets"]["nir"]["href"]

        red = rioxarray.open_rasterio(red_uri, masked=True)
        nir = rioxarray.open_rasterio(nir_uri, masked=True)

        ndvi = (nir - red) / (nir + red)

        file_name = "ndvi_" + item["id"] + ".tif"
        output_path = "/opt/ml/processing/output_data"
        output_file_path = f"{output_path}/{file_name}"

        ndvi.rio.to_raster(output_file_path)
        print("Written output:", output_file_path)

## Setup and execute Processing Job

This notebook uses the [ScriptProcessor](https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.processing.ScriptProcessor) class that is available via the Amazon SageMaker Python SDK. First, you need to create an instance of the class, and then you can start your Processing job by using the `.run()` method.

In [None]:
import sagemaker
from sagemaker import get_execution_role
from sagemaker.sklearn.processing import ScriptProcessor
from sagemaker.processing import ProcessingInput, ProcessingOutput

region = sagemaker.Session().boto_region_name
role = get_execution_role()

geospatial_image_uri = (
    "081189585635.dkr.ecr.us-west-2.amazonaws.com/sagemaker-geospatial-v1-0:latest"
)
processor = ScriptProcessor(
    command=["python3"],
    image_uri=geospatial_image_uri,
    role=role,
    instance_count=5,
    instance_type="ml.m5.xlarge",
    base_job_name="geospatial-processing-example-ndvi",
)

In [None]:
s3_manifest_url = f"s3://{s3_bucket_name}/{s3_key_manifest}"
s3_output_prefix_url = f"s3://{s3_bucket_name}/{s3_prefix}/output"

processor.run(
    code="compute_ndvi.py",
    inputs=[
        ProcessingInput(
            source=s3_manifest_url,
            destination="/opt/ml/processing/input_data/",
            s3_data_type="ManifestFile",
            s3_data_distribution_type="ShardedByS3Key",
        ),
    ],
    outputs=[
        ProcessingOutput(source="/opt/ml/processing/output_data/", destination=s3_output_prefix_url)
    ],
)

In [None]:
preprocessing_job_descriptor = processor.jobs[-1].describe()
s3_output_uri = preprocessing_job_descriptor["ProcessingOutputConfig"]["Outputs"][0]["S3Output"][
    "S3Uri"
]
s3_output_uri

## Visualizing output

After the processing job is completed, you can visualize the data that has been created as output. In this example, we will select a few samples for visualization.

In [None]:
import rioxarray

samples = [
    "S2A_11TNJ_20220312_0_L2A",
    "S2B_11TNJ_20220615_0_L2A",
    "S2A_11TNJ_20220710_0_L2A",
    "S2B_11TNJ_20220814_0_L2A",
    "S2B_11TNJ_20220923_0_L2A",
]

example_rasters = []
for sample_id in samples:
    example_rasters.append(
        (sample_id, rioxarray.open_rasterio(f"{s3_output_uri}/ndvi_{sample_id}.tif"))
    )

In [None]:
import matplotlib.pyplot as plt

for sample_id, raster in example_rasters:
    plt.figure(figsize=(7, 7))
    ax = plt.axes()
    # clip to an area in the scene
    clipped = raster.rio.clip_box(
        minx=500000,
        miny=4800000.0,
        maxx=540000,
        maxy=4840000.0,
    )
    clipped.plot(ax=ax, cmap="RdYlGn", vmin=-1, vmax=1)
    month = sample_id.split("_")[2][4:6]
    ax.set_title(f"Example: NDVI for {month}/2022")
    plt.show()

## Notebook CI Test Results

This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.

![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-1/sagemaker-geospatial|processing-geospatial-ndvi|geospatial-processing-ndvi-intro.ipynb)

![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-2/sagemaker-geospatial|processing-geospatial-ndvi|geospatial-processing-ndvi-intro.ipynb)

![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-1/sagemaker-geospatial|processing-geospatial-ndvi|geospatial-processing-ndvi-intro.ipynb)

![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ca-central-1/sagemaker-geospatial|processing-geospatial-ndvi|geospatial-processing-ndvi-intro.ipynb)

![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/sa-east-1/sagemaker-geospatial|processing-geospatial-ndvi|geospatial-processing-ndvi-intro.ipynb)

![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-1/sagemaker-geospatial|processing-geospatial-ndvi|geospatial-processing-ndvi-intro.ipynb)

![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-2/sagemaker-geospatial|processing-geospatial-ndvi|geospatial-processing-ndvi-intro.ipynb)

![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-3/sagemaker-geospatial|processing-geospatial-ndvi|geospatial-processing-ndvi-intro.ipynb)

![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-central-1/sagemaker-geospatial|processing-geospatial-ndvi|geospatial-processing-ndvi-intro.ipynb)

![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-north-1/sagemaker-geospatial|processing-geospatial-ndvi|geospatial-processing-ndvi-intro.ipynb)

![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-1/sagemaker-geospatial|processing-geospatial-ndvi|geospatial-processing-ndvi-intro.ipynb)

![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-2/sagemaker-geospatial|processing-geospatial-ndvi|geospatial-processing-ndvi-intro.ipynb)

![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-1/sagemaker-geospatial|processing-geospatial-ndvi|geospatial-processing-ndvi-intro.ipynb)

![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-2/sagemaker-geospatial|processing-geospatial-ndvi|geospatial-processing-ndvi-intro.ipynb)

![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-south-1/sagemaker-geospatial|processing-geospatial-ndvi|geospatial-processing-ndvi-intro.ipynb)
