# Execution template jupyter notebooks - draft

This notebook is a generic template for the data notebooks repository. Use this template as a starting point so that as the notebook library grows, there is consistency in formatting as well as the inclusion of required core packages, parameters and tags.
PLease refer to the README.md for more information and a more detailed breakdown of ths templates.

## Execution templates

Exploration notebook templates have some additional requirements over the exploration notebooks.

exploraion notebooks must include papermill comments and tags.

### Parameter definition and tags

For execurtion notebooks to work, both `papermill comments` and `cell tags` must be included. Each cell in this template has these included. `Papermill comments` are different to regular comments, though the syntax is the same. Take care to not remove the papermill comments when cleaning up a notebook for production.

### Parameter cell
One cell must be designated `parameters`. This cell must have BOTH the `papermill comment` designating it as the aprameters cell, and the `tag`

Example:

#### papermill comment:
`papermill_description=parameters`

#### Jupyter tags in vscode:
- use the 'more actions' button in the top right of a cell to access tags
- select `add cell tag`
- make cell tag `parameters`




# Setup
## Library import

### Install packages
Install any required packages if they aren't included in the data-notebooks environment. 
To avoid rebuilding the dev container continuously, you should install packages within the data notebook itself. If  a package is being used consistentlya cross multiple notebooks, then it can be considered for inclusion in the dev container.

### Import packages
Import required python packaged

add papermill_description to every cell
add tags to every cell. 
Parameters cell MUST have both params tag and papermill description
notebook key - for executable notebooks

The papaermill things are comments and are called comments, and these are separate to tags.

exploratory vs executbale notebook template. Make example of both

In [None]:
#papermill_description=imports

# Core packages
import json
import os
from io import StringIO

# Data manipulation
import numpy as np
import pandas as pd

# Geospatial
import geopandas as gpd
import rasterio
import pystac_client
import rioxarray
from gis_utils.stac import initialize_stac_client, query_stac_api
from geodata_fetch import settingshandler, harvest

# Visualisations in notebook
from IPython.display import display, JSON
import holoviews as hv
import geoviews as gv
import panel as pn

# Exporting data
import boto3
from aws_utils import S3Utils

# this is a GDAL flag, it does not impact AWS access.  Used for accessing public buckets, which we do for some AWS earth data repositories
os.environ['AWS_NO_SIGN_REQUEST'] = 'YES'

hv.extension('bokeh')

## Functions

In [None]:
#papermill_description=get_bbox_from_geodf

def get_bbox_from_geodf(geojson_data):
    """
    Extract the bounding box from a GeoJSON-like dictionary.
    
    Parameters:
    - geojson_data (dict): The GeoJSON data as a Python dictionary.
    
    Returns:
    - A list representing the bounding box [min_lon, min_lat, max_lon, max_lat].
    """
    gdf = gpd.GeoDataFrame.from_features(geojson_data["features"])
    bbox = list(gdf.total_bounds)
    return bbox

In [None]:
#papermill_description=compute_elevation_statistics

def compute_elevation_statistics(dem_data):
    """
    Compute basic elevation statistics from a digital elevation model (DEM) dataset.

    This function calculates the minimum, maximum, mean, and standard deviation of elevation
    values within the provided DEM data array. It handles the DEM data as a NumPy array,
    which is a common format for raster data in Python.

    Parameters:
    - dem_data (numpy.ndarray): A 2D NumPy array containing elevation data from a DEM raster.
      The array should contain numeric values representing elevation at each cell. No-data
      values should be represented by NaNs in the array to be properly ignored in calculations.

    Returns:
    - dict: A dictionary containing the computed elevation statistics, with keys 'min_elevation',
      'max_elevation', 'mean_elevation', and 'std_dev_elevation'.
    """

    # Compute the minimum elevation, ignoring any NaN values which represent no-data cells
    min_elevation = float(np.nanmin(dem_data))

    # Compute the maximum elevation, ignoring any NaN values
    max_elevation = float(np.nanmax(dem_data))

    # Compute the mean elevation, ignoring any NaN values
    mean_elevation = float(np.nanmean(dem_data))

    # Compute the standard deviation of elevation, ignoring any NaN values
    std_dev_elevation = float(np.nanstd(dem_data))

    # Construct and return a dictionary containing the computed statistics
    stats = {
        'min_elevation': min_elevation,
        'max_elevation': max_elevation,
        'mean_elevation': mean_elevation,
        'std_dev_elevation': std_dev_elevation
    }

    return stats

In [None]:
#papermill_description=map_visualisation

def list_tif_files(path):
    return [f for f in os.listdir(path) if f.endswith('cm.tif')]

# Function to load and display the selected .tif file
def load_and_display_tif(filename):
    filepath = os.path.join(path_settings, filename)
    img = gv.util.from_xarray(rioxarray.open_rasterio(filepath).rio.reproject('EPSG:3857'))
    
    # Define map tiles and create the map image
    map_tiles = gv.tile_sources.EsriImagery().opts(width=1000, height=600)
    map_img = gv.Image(img, kdims=['x', 'y']).opts(cmap='viridis', title=filename)
    map_combo = map_tiles * map_img
    
    return map_combo

## Data import

In [None]:
#papermill_description=processing_file_io

# paths for input and output directories.
input_dir = '/workspace/notebooks/sandbox/data/input-data'
output_dir = '/workspace/notebooks/sandbox/data/output-data'

# Path for the input geojson file. This file will then be imported as a geodataframe
input_geojson_filename = 'dissolved-boundaries.geojson'
input_geom = os.path.join(input_dir, input_geojson_filename)

# filename for the getdata harvester settings that will be generated from parameters.
geodata_params_fname = 'settings_showcase.json'
geodata_params = os.path.join(output_dir,geodata_params_fname)


property_name = "test_farm"
notebook_key = "localjupyter"


# Import the chosen geometry file as a geodataframe
geom = gpd.read_file(input_geom)

## Data processing

In [None]:
#papermill_description=process_input_geometry

# Setting parameters to create the settings.json file

# Column names for latitude and longitude in input file:
colname_lat = geom.centroid.y[0]
colname_lng = geom.centroid.x[0]

# Bounding box: Left (min Long), Bottom (min Lat), Right (max Long), Top (max Lat)
target_bbox = list(geom.total_bounds)



In [None]:
#papermill_description=processing_geodata_input_parameters


# Resolution of data download in arcseconds (1 arcsec ~ 30m)
target_res = 3

date_start = "2022-10-01"
date_end = "2022-11-30"
# Number of time intervals to split the image collection into
time_intervals = 0

# This example is only selecting one target source. See other example notebooks for more complicated examples of fetching multiple data sources
target_sources = {"DEM":"DEM"}

json_data = {
    "infile": property_name,
    "outpath": output_dir,
    "target_centroid_lat": colname_lat,
    "target_centroid_lng": colname_lng,
    "target_bbox": target_bbox,
    "target_res": str(target_res),
    "date_start": date_start if date_start is not None else "2022-10-01", #a date of some kind must be provided or the harvester complains
    "date_end": date_end if date_end is not None else "2022-11-30",
    "time_intervals": time_intervals,
    "target_sources": target_sources
}

#write out the parameters as a json file. This can be replaced with an API call at a later date.

with open(geodata_params, "w", encoding='utf-8') as file:
    json.dump(json_data, file, ensure_ascii=False, indent=4)

In [None]:
#papermill_description=parameters

# store settings as namespace (easier to interact with)
settings = settingshandler.main(geodata_params)

In [None]:
#papermill_description=geodata_collection

df = harvest.run(geodata_params)

## Data Visualisation and export

While you can include visualisation cells in the production notebooks, they do slow down the notebook execution and so should be commented out or controlled using boolean flags. This way they can be used during local testing, and turned off in production.

In [None]:
#papermill_description=processing_s3

# Load AWS credentials from environment variables
aws_access_key_id = os.getenv('AWS_ACCESS_KEY_ID')
aws_secret_access_key = os.getenv('AWS_SECRET_ACCESS_KEY')
aws_default_region = 'us-east-1'
bucket_name = 'jenna-remote-sensing-sandbox'

s3_client = S3Utils(
    aws_access_key_id=aws_access_key_id,
    aws_secret_access_key=aws_secret_access_key,
    region_name=aws_default_region,
    s3_bucket='jenna-remote-sensing-sandbox',
    prefix=notebook_key
)   

In [None]:
#papermill_description=elevation_process_variables

# Construct the filenames using propertyName
elevation_json_filename = f"dem_{property_name}_elevation_stats.json"

#output tif name hardcoded for now but will be dynamically read later
output_tiff_filename = 'DEM_SRTM_1_Second_Hydro_Enforced.tif'

Read in the collected DEM and comput some statistics on it

In [None]:
#papermill_description=read_in_tif_to_compute_stats


# Use visualisation to check the input file is correct
dem_tiff_dir = os.path.join(output_dir, output_tiff_filename)
data = rioxarray.open_rasterio(dem_tiff_dir)


In [None]:
#papermill_description=calculate_elevation_stats


elevation_stats = compute_elevation_statistics(data)

# Serialize 'elevation_stats' to a JSON string
elevation_stats_json = json.dumps(elevation_stats)
# Convert the JSON string to bytes
elevation_stats_bytes = elevation_stats_json.encode()

# print elevation stats as a check:
elevation_stats

Save to AWS S3 bucket

In [None]:
s3_client.upload_file(file_path=dem_tiff_dir, file_name=output_tiff_filename)

In [None]:
s3_client.list_files()

In [None]:
s3_client.generate_presigned_urls()