# Application Package development and testing
## Water Bodies detection

Scenario: Alice implements the Application Package

## Background 
This Application Package takes as input Copernicus Sentinel-2 data and detects water bodies by applying the *Otsu* thresholding technique on the Normalized Difference Water Index (NDWI).

The NDWI is calculated with: 

$$
NDWI = { (green - nir) \over (green + nir) } 
$$

Typically, NDWI values of water bodies are larger than 0.2 and built-up features have positive values between 0 and 0.2. Vegetation has much smaller NDWI values, which results in distinguishing vegetation from water bodies easier. 

The NDWI values correspond to the following ranges:

| Range       | Description                            |
| ----------- | -------------------------------------- |
| 0,2 - 1     | Water surface                          |
| 0.0 - 0,2   | Flooding, humidity                     |
| -0,3 - 0.0  | Moderate drought, non-aqueous surfaces |
| -1 - -0.3   | Drought, non-aqueous surfaces          |

To ease the determination of the water surface/non water surface, the Ostu thresholding technique is used. 

In the simplest form, the Otsu algorithm returns a single intensity threshold that separate pixels into two classes, foreground and background. This threshold is determined by minimizing intra-class intensity variance, or equivalently, by maximizing inter-class variance:

![image](https://upload.wikimedia.org/wikipedia/commons/3/34/Otsu%27s_Method_Visualization.gif)

## Application Workflow
The Water Bodies detection steps are depicted below:
``` mermaid
graph TB
  A[STAC Items] --> B
  A[STAC Items] --> C
subgraph Process STAC item
  B["crop(green)"] --> D[Normalized difference];
  C["crop(nir)"] --> D[Normalized difference];
  D --> E[Otsu threshold]
end
  E --> F[Create STAC]
```

The application takes a list of Sentinel-2 STAC items references and then crops the radiometric bands `green` and `NIR` with a user-defined area of interest (AOI). Each cropped band is then used to calculate the `NDWI` and subsequently the Otsu threashold is applied to it, generating the water bodies output mask. The final step of the workflow consists on generating the STAC catalog and items for the generated results.

Alice organizes the Application Package to include a macro workflow that reads the list of Sentinel-2 STAC items references, the AOI and the EPSG code. The workflow steps include i) a sub-workflow for the detection of the water bodies and ii) a step to create the STAC catalog of the generated output product(s)

![image](water_bodies.png "water-bodies")

The sub-workflow applies the  `crop`, `Normalized difference`, `Otsu threshold` steps:

![image](detect_water_body.png "detect-water-body")

## Input Sentinel-2 acquisitions
The development and test dataset is made of two Sentinel-2 acquisitions:

| Acquisitions 	|Image 1                    	|Image 2                    	|
|--------------	|---------------------------	|---------------------------	|
| Date         	|2021-07-13                 	|2022-05-24                 	|
| URL          	| [S2B_10TFK_20210713_0_L2A](https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l2a-cogs/items/S2B_10TFK_20210713_0_L2A) 	| [S2A_10TFK_20220524_0_L2A](https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l2a-cogs/items/S2A_10TFK_20220524_0_L2A) 	|
| Quicklook    	| ![image](img_20210713.jpg) 	| ![image](img_20220504.jpg) 	|

## Environments creation

Each `Command Line Tool` step (`crop`, `Normalized difference`, `Otsu threshold` and `Create STAC`) runs a Python script in a dedicated environment / container. 
To generate the environments, open a new `Terminal` and execute the commands below (either one by one or all at once).

**Note**: This configuration step takes around five minutes to complete.

In [3]:
!mamba create -c conda-forge -y -p /srv/conda/envs/env_test  gdal click pystac 
!mamba create -c conda-forge -y -p /srv/conda/envs/env_norm_diff click gdal  
!mamba create -c conda-forge -y -p /srv/conda/envs/env_otsu gdal scikit-image click 
!mamba create -c conda-forge -y -p /srv/conda/envs/env_stac click pystac python=3.9 pip && \
    /srv/conda/envs/env_stac/bin/pip install rio_stac
!mamba clean --all -f -y


                  __    __    __    __
                 /  \  /  \  /  \  /  \
                /    \/    \/    \/    \
███████████████/  /██/  /██/  /██/  /████████████████████████
              /  / \   / \   / \   / \  \____
             /  /   \_/   \_/   \_/   \    o \__,
            / _/                       \_____/  `
            |/
        ███╗   ███╗ █████╗ ███╗   ███╗██████╗  █████╗
        ████╗ ████║██╔══██╗████╗ ████║██╔══██╗██╔══██╗
        ██╔████╔██║███████║██╔████╔██║██████╔╝███████║
        ██║╚██╔╝██║██╔══██║██║╚██╔╝██║██╔══██╗██╔══██║
        ██║ ╚═╝ ██║██║  ██║██║ ╚═╝ ██║██████╔╝██║  ██║
        ╚═╝     ╚═╝╚═╝  ╚═╝╚═╝     ╚═╝╚═════╝ ╚═╝  ╚═╝

        mamba (0.23.1) supported by @QuantStack

        GitHub:  https://github.com/mamba-org/mamba
        Twitter: https://twitter.com/QuantStack

█████████████████████████████████████████████████████████████


Looking for: ['gdal', 'click', 'pystac']

conda-forge/linux-64                                        Using cach

## Application Package inspection

Open the `app-package.cwl` Application Package and familiarise yourself with its structure, to understand what's going on during execution:  

1. Inspect the main workflow which `id` is **`water_bodies`**: 
    1.1. What are the input parameters? *(stac_items, aoi, epsg)*
    1.2. What are the steps of this workflow? *(node_water_bodies, node_stac)* 
2. Inspect the workflow which `id` is **`detect_water_body`**:
    2.1. What are the steps of this workflow? *(node_crop, node_normalized_difference, node_otsu)*
3. Inspect each of the `CommandLineTool` of `id`: **`crop`**, **`norm_diff`**, **`otsu`** and **`stac`** 
    3.1. Inspect each of the `Dockerfile` 


## Application Package execution

The water bodies Application Package can be executed with: 
```
cwltool --no-container app-package.cwl#water_bodies params.yml > out.json
```
where:
* `cwltool` is a Common Workflow Language runner. 
* The flag `--no-container` is used to instruct `cwltool` to use the local command-line tools instead of using the containers.
* `app-package.cwl#water_bodies` defines the CWL file to execute as well as the entry point after the `#` symbol. Here it's the `Workflow` with the id `water_bodies`.
* The file `params.yml` is used to define the input parameters. In this case, these are:

```
stac_items:
- "https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l2a-cogs/items/S2B_10TFK_20210713_0_L2A"
- "https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l2a-cogs/items/S2A_10TFK_20220524_0_L2A"

aoi: "-121.399,39.834,-120.74,40.472"
epsg: "EPSG:4326"
```
* `out.json` is used to store the execution logs 

In [4]:
!cwltool --no-container ../water-bodies/app-package.cwl#water_bodies ../water-bodies/params.yml > out.json

[1;30mINFO[0m /srv/conda/bin/cwltool 3.1.20220224085855
[1;30mINFO[0m Resolved '../water-bodies/app-package.cwl#water_bodies' to 'file:///workspace/workshop/07_app_package/water-bodies/app-package.cwl#water_bodies'
[1;30mINFO[0m [workflow ] start
[1;30mINFO[0m [workflow ] starting step node_water_bodies
[1;30mINFO[0m [step node_water_bodies] start
[1;30mINFO[0m [workflow node_water_bodies] start
[1;30mINFO[0m [workflow node_water_bodies] starting step node_crop
[1;30mINFO[0m [step node_crop] start
[1;30mINFO[0m [job node_crop] /tmp/2k8ek72k$ python \
    -m \
    app \
    --aoi \
    -121.399,39.834,-120.74,40.472 \
    --band \
    green \
    --epsg \
    EPSG:4326 \
    --input-item \
    https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l2a-cogs/items/S2B_10TFK_20210713_0_L2A
[1;30mINFO[0m [job node_crop] Max memory used: 68MiB
[1;30mINFO[0m [job node_crop] completed success
[1;30mINFO[0m [step node_crop] start
[1;30mINFO[0m [job node_crop

## Result inspection

The execution of the `cwltool` generates the output `out.json` file, as well as a folder which name is a 8-character alphanumeric string. In this folder are stored the generated `catalog.json` and the `otsu.tif` and related STAC item for each of the two input Sentinel-2 images, in the structure below:
* `catalog.json`
* `S2A_10TFK_20220524_0_L2A`
    * `otsu.tif`
    * `S2A_10TFK_20220524_0_L2A.json`
* `S2A_10TFK_20220524_0_L2A`
    * `otsu.tif`
    * `S2A_10TFK_20220524_0_L2A`

You can plot the output `otsu.tif` files with the `visualisation.ipynb` Jupyter Notebook. This Notebook uses `pystac` to access the geotiffs produced, `leafmap` to plot the tiles served by a local tile server. Open the Notebook and run all cells. 

First, run the commands below to create the environment:

In [6]:
!mamba create -c conda-forge -y -p /srv/conda/envs/env_visual pystac ipykernel jupyterlab localtileserver jupyter-server-proxy pip && \
    /srv/conda/envs/env_visual/bin/pip install leafmap


                  __    __    __    __
                 /  \  /  \  /  \  /  \
                /    \/    \/    \/    \
███████████████/  /██/  /██/  /██/  /████████████████████████
              /  / \   / \   / \   / \  \____
             /  /   \_/   \_/   \_/   \    o \__,
            / _/                       \_____/  `
            |/
        ███╗   ███╗ █████╗ ███╗   ███╗██████╗  █████╗
        ████╗ ████║██╔══██╗████╗ ████║██╔══██╗██╔══██╗
        ██╔████╔██║███████║██╔████╔██║██████╔╝███████║
        ██║╚██╔╝██║██╔══██║██║╚██╔╝██║██╔══██╗██╔══██║
        ██║ ╚═╝ ██║██║  ██║██║ ╚═╝ ██║██████╔╝██║  ██║
        ╚═╝     ╚═╝╚═╝  ╚═╝╚═╝     ╚═╝╚═════╝ ╚═╝  ╚═╝

        mamba (0.23.1) supported by @QuantStack

        GitHub:  https://github.com/mamba-org/mamba
        Twitter: https://twitter.com/QuantStack

█████████████████████████████████████████████████████████████


Looking for: ['pystac', 'ipykernel', 'jupyterlab', 'localtileserver', 'jupyter-server-proxy', 'pip']

[?25l[2K

In [7]:
# Import liraries
from localtileserver import TileClient, get_leaflet_tile_layer
import leafmap.foliumap as leafmap

import pystac
import json
import numpy as np
import os

# Define env variables for localtileserver
os.environ["GTIFF_SRS_SOURCE"] = "EPSG"
os.environ['LOCALTILESERVER_CLIENT_PREFIX'] = 'proxy/{port}'
os.environ['PROJ_DATA'] = '/srv/conda/envs/env_visual/share/proj/'
os.environ['GDAL_DATA'] = '/srv/conda/envs/env_visual/share/gdal/'

Load the JSON result listing generated by `cwltool`:

In [8]:
with open("/workspace/workshop/07_app_package/water-bodies/outputs/out.json") as f: 
    results = json.load(f)

Look for the `catalog.json` file:

In [9]:
for item in results["stac_catalog"]["listing"]:
    
    if item['basename'] == "catalog.json":
        catalog = pystac.read_file(item["path"])
        break

List the contents of the STAC Catalog

In [10]:
catalog.describe()

* <Catalog id=catalog>
  * <Item id=S2B_10TFK_20210713_0_L2A>
  * <Item id=S2A_10TFK_20220524_0_L2A>


In [11]:
it1 = catalog.get_item('S2B_10TFK_20210713_0_L2A')
it1

0
id: S2B_10TFK_20210713_0_L2A
"bbox: [-121.413752588606, 39.83402935827303, -120.71922542174708, 40.47202226335379]"
proj:epsg: 32610
"proj:geometry: {'type': 'Polygon', 'coordinates': [[[635710.0, 4411780.0], [693380.0, 4411780.0], [693380.0, 4481380.0], [635710.0, 4481380.0], [635710.0, 4411780.0]]]}"
"proj:bbox: [635710.0, 4411780.0, 693380.0, 4481380.0]"
"proj:shape: [6960, 5767]"
"proj:transform: [10.0, 0.0, 635710.0, 0.0, -10.0, 4481380.0, 0.0, 0.0, 1.0]"
"proj:projjson: {'$schema': 'https://proj.org/schemas/v0.4/projjson.schema.json', 'type': 'ProjectedCRS', 'name': 'WGS 84 / UTM zone 10N', 'base_crs': {'name': 'WGS 84', 'datum': {'type': 'GeodeticReferenceFrame', 'name': 'World Geodetic System 1984', 'ellipsoid': {'name': 'WGS 84', 'semi_major_axis': 6378137, 'inverse_flattening': 298.257223563}}, 'coordinate_system': {'subtype': 'ellipsoidal', 'axis': [{'name': 'Geodetic latitude', 'abbreviation': 'Lat', 'direction': 'north', 'unit': 'degree'}, {'name': 'Geodetic longitude', 'abbreviation': 'Lon', 'direction': 'east', 'unit': 'degree'}]}, 'id': {'authority': 'EPSG', 'code': 4326}}, 'conversion': {'name': 'UTM zone 10N', 'method': {'name': 'Transverse Mercator', 'id': {'authority': 'EPSG', 'code': 9807}}, 'parameters': [{'name': 'Latitude of natural origin', 'value': 0, 'unit': 'degree', 'id': {'authority': 'EPSG', 'code': 8801}}, {'name': 'Longitude of natural origin', 'value': -123, 'unit': 'degree', 'id': {'authority': 'EPSG', 'code': 8802}}, {'name': 'Scale factor at natural origin', 'value': 0.9996, 'unit': 'unity', 'id': {'authority': 'EPSG', 'code': 8805}}, {'name': 'False easting', 'value': 500000, 'unit': 'metre', 'id': {'authority': 'EPSG', 'code': 8806}}, {'name': 'False northing', 'value': 0, 'unit': 'metre', 'id': {'authority': 'EPSG', 'code': 8807}}]}, 'coordinate_system': {'subtype': 'Cartesian', 'axis': [{'name': 'Easting', 'abbreviation': '', 'direction': 'east', 'unit': 'metre'}, {'name': 'Northing', 'abbreviation': '', 'direction': 'north', 'unit': 'metre'}]}, 'id': {'authority': 'EPSG', 'code': 32610}}"
datetime: 2021-07-13T19:03:24Z

0
https://stac-extensions.github.io/projection/v1.0.0/schema.json

0
href: ./otsu.tif
type: image/tiff; application=geotiff
"roles: ['data', 'visual']"
owner: S2B_10TFK_20210713_0_L2A

0
rel: root
href: /workspace/workshop/07_app_package/water-bodies/outputs/catalog.json
type: application/json

0
rel: self
href: /workspace/workshop/07_app_package/water-bodies/outputs/S2B_10TFK_20210713_0_L2A/S2B_10TFK_20210713_0_L2A.json
type: application/json

0
rel: parent
href: /workspace/workshop/07_app_package/water-bodies/outputs/catalog.json
type: application/json


In [12]:
it1.properties

{'proj:epsg': 32610,
 'proj:geometry': {'type': 'Polygon',
  'coordinates': [[[635710.0, 4411780.0],
    [693380.0, 4411780.0],
    [693380.0, 4481380.0],
    [635710.0, 4481380.0],
    [635710.0, 4411780.0]]]},
 'proj:bbox': [635710.0, 4411780.0, 693380.0, 4481380.0],
 'proj:shape': [6960, 5767],
 'proj:transform': [10.0, 0.0, 635710.0, 0.0, -10.0, 4481380.0, 0.0, 0.0, 1.0],
 'proj:projjson': {'$schema': 'https://proj.org/schemas/v0.4/projjson.schema.json',
  'type': 'ProjectedCRS',
  'name': 'WGS 84 / UTM zone 10N',
  'base_crs': {'name': 'WGS 84',
   'datum': {'type': 'GeodeticReferenceFrame',
    'name': 'World Geodetic System 1984',
    'ellipsoid': {'name': 'WGS 84',
     'semi_major_axis': 6378137,
     'inverse_flattening': 298.257223563}},
   'coordinate_system': {'subtype': 'ellipsoidal',
    'axis': [{'name': 'Geodetic latitude',
      'abbreviation': 'Lat',
      'direction': 'north',
      'unit': 'degree'},
     {'name': 'Geodetic longitude',
      'abbreviation': 'Lon',


In [13]:
m = leafmap.Map()

for item in catalog.get_all_items():
    m.add_raster(item.get_assets()["data"].get_absolute_href(), layer_name=item.id) #, crs="EPSG:32610")

m