# Production/execution template jupyter notebook

This notebook is a generic template for the data notebooks repository. It is based on a notebook that collects Digital elevation data from an online source (a STAC endpoint hosted by Geoscience Australia). You can use this notebook as a guide for creating your own notebooks.

Use this template as a starting point so that as the notebook library grows, there is consistency in formatting as well as the inclusion of required core packages, parameters and tags.
PLease refer to the README.md for more information and a more detailed breakdown of ths templates.

## Execution templates

Executeion (also called production) notebooks have some additional requirements over the exploration notebooks.

Execution notebooks must include papermill comments and tags.

### Parameter definition and tags

For execution notebooks to work, both `papermill comments` and `cell tags` must be included. Each cell in this template has these included. `Papermill comments` are different to regular comments, though the syntax is the same. Take care to not remove the papermill comments when cleaning up a notebook for production.


### Parameter cell
One cell must be designated `parameters`. This cell must have BOTH the `papermill comment` designating it as the aprameters cell, and the `tag`.

Parameters that need to be passed into the notebook must be included in the parameter cell. If the user will be inputting variables in the front end of the platform, for example a date range or a dataset name, you will input these here. 

Example:

#### papermill comment:
`papermill_description=parameters`

#### Jupyter tags in vscode:
- use the 'more actions' button in the top right of a cell to access tags
- select `add cell tag`
- make cell tag `parameters`

## Schema.json file
When creating a notebook that will be usable in the platform and accessed by users, you will also need to create a `schema.json` file. There are examples of these in the `production/` directory and an example is provided in this directory too.


### Install packages

Unlike the experimental notebooks, you need to make sure all packages used in a production/execution notebook are included in the GIS image. You can add packages to the `requirements.txt` files in this repository, following the guidelines in the README.md.

If you are updating the GIS image, you will need to rebuild the image and restart the kernel in this notebook, as well as have access to the ECR repository through aws-vault. 

If you do not need to add new packages, you can safely proceed to installing the packages in the next cell.

add a papermill_description to every cell
add tags to every cell. 
Parameters cell MUST have both params tag and papermill description
notebook key - for executable notebooks

The papaermill things are comments and are called comments, and these are separate to tags.

exploratory vs executbale notebook template. Make example of both

In [None]:
#papermill_description=imports

# Core packages
import json
import os
from io import StringIO

# Data manipulation
import numpy as np

# Geospatial
import geopandas as gpd
import rasterio
import pystac_client
from gis_utils.stac import initialize_stac_client, query_stac_api, save_metadata_sidecar, process_dem_asset_and_mask
from gis_utils.dataframe import get_bbox_from_geodf
import rasterio.plot
from rio_cogeo.cogeo import cog_translate
from rio_cogeo.profiles import cog_profiles
from rasterio.io import MemoryFile
from rasterio.warp import calculate_default_transform


# this is a GDAL flag, it does not impact AWS access.  Used for accessing public buckets, which we do for some AWS earth data repositories

os.environ['AWS_NO_SIGN_REQUEST'] = 'YES'

### Functions

It is good practice to keep your functions at the top of the notebook so you can easily find them. Please include documentation for your functions in a notebook being used in production.

In [None]:
#papermill_description=get_coords_from_geodataframe

def get_coords_from_geodataframe(gdf):
    """Function to parse features from GeoDataFrame in such a manner that rasterio wants them"""
    import json
    return [json.loads(gdf.to_json())['features'][0]['geometry']]


In [None]:
#papermill_description=compute_elevation_statistics

def compute_elevation_statistics(dem_data):
    """
    Compute basic elevation statistics from a digital elevation model (DEM) dataset.

    This function calculates the minimum, maximum, mean, and standard deviation of elevation
    values within the provided DEM data array. It handles the DEM data as a NumPy array,
    which is a common format for raster data in Python.

    Parameters:
    - dem_data (numpy.ndarray): A 2D NumPy array containing elevation data from a DEM raster.
      The array should contain numeric values representing elevation at each cell. No-data
      values should be represented by NaNs in the array to be properly ignored in calculations.

    Returns:
    - dict: A dictionary containing the computed elevation statistics, with keys 'min_elevation',
      'max_elevation', 'mean_elevation', and 'std_dev_elevation'.
    """

    # Compute the minimum elevation, ignoring any NaN values which represent no-data cells
    min_elevation = float(np.nanmin(dem_data))

    # Compute the maximum elevation, ignoring any NaN values
    max_elevation = float(np.nanmax(dem_data))

    # Compute the mean elevation, ignoring any NaN values
    mean_elevation = float(np.nanmean(dem_data))

    # Compute the standard deviation of elevation, ignoring any NaN values
    std_dev_elevation = float(np.nanstd(dem_data))

    # Construct and return a dictionary containing the computed statistics
    stats = {
        'min_elevation': min_elevation,
        'max_elevation': max_elevation,
        'mean_elevation': mean_elevation,
        'std_dev_elevation': std_dev_elevation
    }

    return stats

### Parameter cell

The cell below is the parameter cell that is required for production notebooks. This cell must have BOTH the `papermill comment` designating it as the aprameters cell, and the `tag`. 

You will include default values for parameters, and these will be replaced by values provided via the front end of the platform when the notebook is executed. This example includes an area of interest defined as a geojson polygon.

In [None]:
#papermill_description=parameters

notebook_key = "localjupyter"
geojson = {
    'body': {
        "type": "FeatureCollection",
        "name": "dissolved-boundaries",
        "crs": {
            "type": "name",
            "properties": {
                "name": "urn:ogc:def:crs:OGC:1.3:CRS84" 
            }
        },
        "features": [
            {
                "type": "Feature",
                "properties": {
                    "fid": 1
                },
                "geometry": {
                    "type": "Polygon",
                    "coordinates": [
                        [
                            [116.26012130269045, -29.225295369642396],
                            [116.261724812149055, -29.241374854584375],
                            [116.283751968396274, -29.256813692452539],
                            [116.284342735038919, -29.268250184258388],
                            [116.292247755352392, -29.265992437426529],
                            [116.292360282331941, -29.293057573630019],
                            [116.314865678242256, -29.293523728033122],
                            [116.326259034921833, -29.293033039128805],
                            [116.326315298411629, -29.305397680579894],
                            [116.355065941687045, -29.307016748931797],
                            [116.355065941687045, -29.306575187382712],
                            [116.383366477044206, -29.307384715430175],
                            [116.384322956370426, -29.290407813444993],
                            [116.387586238777402, -29.282629879611861],
                            [116.386517232471661, -29.259807919053017],
                            [116.359201308185533, -29.259488866292969],
                            [116.359229439930417, -29.259243440415627],
                            [116.35242155766754, -29.259292525638209],
                            [116.352140240218716, -29.220237788279107],
                            [116.302234524787593, -29.223503148505326],
                            [116.281388901825679, -29.2239696200396],
                            [116.26012130269045, -29.225295369642396]
                        ]
                    ]
                }
            }
        ]
    }
}

# These parameters must also be in the schema.json. See the template schema file for the example that matches this template notebook.
propertyName = "test"
output_type = "overlay"
colormap = "gist_earth"
si_unit = "metres above sea level"
si_unit_short = "m"


## Filename construction and saving files

### Geotiff files
Geotiff files should have the file extension `.tiff`. Both `.tif` and `.tiff` are valid extensions, but `.tiff` is what we use.

We also use additional attribute extensions to denote the intermediary files that will be discarded after the final file is created. It is important that the final file that is going to be uploaded to S3 end with `_cog.public.tiff`. This is because the platform will look for files with this extension to upload to S3.

### Saving files to be uploaded to S3

When saving files to be uploaded to S3, you must save them in the `/tmp/` directory. This is because the `/tmp/` directory is the only directory that the platform has write access to.

If you are testing locally, ths directory won't exist (it's a cloud thing). You can add a boolean flag and alternate between a directory within your workspace and `/tmp/` when testing. An example of how to do this is shown in the `slga.ipynb` production notebook.

In [1]:
#papermill_description=process_variables

# Construct the filenames using propertyName
# name_property-name_attribute.extension
elevation_json_filename = f"/tmp/{notebook_key}/dem_{propertyName}_elevation-stats.json"
output_tiff_filename = f"/tmp/{notebook_key}/dem_{propertyName}.tiff"

output_colored_tiff_filename = f"/tmp/{notebook_key}/dem_colored_{propertyName}.tiff"
output_cog_filename = f"/tmp/{notebook_key}/dem_{propertyName}_cog.public.tiff"

NameError: name 'notebook_key' is not defined

### Passing geojson data to the notebook

The geojson data is passed to the notebook as a string. This string is then converted to a geojson object in the notebook. This is done to comply with how the platform passes data to the notebook.

In [None]:
#papermill_description=processing_file_io

req = geojson
geojson_data = req['body']

# Convert the GeoJSON string to a GeoDataFrame
gdf = gpd.read_file(StringIO(json.dumps(geojson_data)))

### Getting additional spatial data

This notebook also extracts the centre point of the area of interest and the bounding box (bbox). Becuase polygons can be complicated shapes with many vertices, it is often fater to use the bounding box to extract data from an API. The bounding box is a rectangle that surrounds the polygon. The centre point is the middle of the bounding box.

In [None]:
#papermill_description=processing_bounding_box

# Get bounding box from GeoJSON
bbox = get_bbox_from_geodf(geojson_data)

# Get polygon coordinates in rasterio-friendly format
coords = get_coords_from_geodataframe(gdf)

In [None]:
#papermill_description=processing_stac_init

stac_url_dem = "https://explorer.sandbox.dea.ga.gov.au/stac/"
collections_dem = ['ga_srtm_dem1sv1_0']

# Initialize STAC clients
print(f"Initializing STAC client for DEM with URL: {stac_url_dem} and collections: {collections_dem}")
client_dem = initialize_stac_client(stac_url_dem)

In [None]:
#papermill_description=processing_stac_search

# Query STAC catalogs
items_dem = query_stac_api(client_dem, bbox, collections_dem, None, None) #modified the query_stac_api function to accept polygon + bbox for masking

In [None]:
#papermill_description=processing_stac_assets

# Only want the dem asset
item = items_dem[0]
dem_asset = item.assets.get('dem')
fallback_dem = {
		'title': 'dem',
		'href': 'https://dea-public-data.s3-ap-southeast-2.amazonaws.com/projects/elevation/ga_srtm_dem1sv1_0/dem1sv1_0.tif'
}
primary_dem = dem_asset if dem_asset else fallback_dem

In [None]:
#papermill_description=processing_dem_asset

# Modified function including mask/clip:
data, metadata, src = process_dem_asset_and_mask(primary_dem, coords, bbox, output_tiff_filename)

In [None]:
#papermill_description=processing

elevation_stats = compute_elevation_statistics(data)

# Serialize 'elevation_stats' to a JSON string
elevation_stats_json = json.dumps(elevation_stats)
# Convert the JSON string to bytes
elevation_stats_bytes = elevation_stats_json.encode()

# asset_type signifies the type of asset, e.g. overlay that is stored in the application DB
asset_metadata = {
    'properties': {
        'output_type': output_type,
        'si_unit': si_unit,
        'si_unit_short': si_unit_short,
        'name': 'DEM',
    },
    'data': {
        'elevation_stats': elevation_stats,
    },
}

In [None]:
#papermill_description=processing_cog

with rasterio.open(output_tiff_filename) as mew:
    meta = mew.meta.copy()
    dst_crs = rasterio.crs.CRS.from_epsg(4326)
    transform, width, height = calculate_default_transform(
        mew.crs, dst_crs, mew.width, mew.height, *mew.bounds
    )

    meta.update({
        'crs': dst_crs,
        'transform': transform,
        'width': width,
        'height': height
    })

    tif_data = mew.read(1, masked=True).astype('float32') #setting masked=True here tells rasterio to use masking information if present, but we need to add the mask itself first.
    mew_formatted = tif_data.filled(np.nan)

    cmap = cm.get_cmap(colormap) #can also use 'terrain' cmap to keep this the same as the preview image from above.
    na = mew_formatted[~np.isnan(mew_formatted)]

    min_value = min(na)
    max_value = max(na)

    norm = Normalize(vmin=min_value, vmax=max_value)

    coloured_data = (cmap(norm(mew_formatted))[:, :, :3] * 255).astype(np.uint8)

    meta.update({"count":3})


    with rasterio.open(output_colored_tiff_filename, 'w', **meta) as dst:
        reshape = reshape_as_raster(coloured_data)
        dst.write(reshape)

try:
    dst_profile = cog_profiles.get('deflate')
    with MemoryFile() as mem_dst:
        cog_translate(
            output_colored_tiff_filename,
            output_cog_filename,
            config=dst_profile,
            in_memory=True,
            dtype="uint8",
            add_mask=False,
            nodata=0,
            dst_kwargs=dst_profile
        )
    
    save_metadata_sidecar(output_cog_filename, asset_metadata)    
except:
    raise Exception('Unable to convert to cog')