# Pathfinder Phase Workshop: Visualising Data

__Description & purpose__: This Notebook is designed to showcase the functionality of the Earth Observation Data Hub (EODH) as the project approaches the end of the Pathfinder Phase. It provides a snapshot of the Hub, the `pyeodh` API client and the various datasets as of February 2025.  

__Author(s)__: Alastair Graham, Dusan Figala, James Hinton, Daniel Westwood

__Date created__: 2025-02-18

__Date last modified__: 2025-02-25

__Licence__: This notebook is licensed under [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/).  The code is released using the [BSD-2-Clause](https://www.tldrlegal.com/license/bsd-2-clause-license-freebsd) license.


<span style="font-size:0.75em;">
Copyright (c) , All rights reserved.</span>

<span style="font-size:0.75em;">
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:</span>

<span style="font-size:0.75em;">
Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.</span>

### Presentation set up

The following cell only needs to be run on the EODH AppHub.  If you have a local Python environment running, please install the required packages as you would normally.

In [None]:
# If needed you can install a package in the current AppHub Jupyter environment using pip
# For instance, we will need at least the following libraries
import sys
!{sys.executable} -m pip install --upgrade pyeodh geopandas matplotlib numpy folium xarray geoviews hvplot holoviews datashader cartopy

# Introduction

In this notebook we will explore different ways to visualise data held on the Hub. The Notebooks can be run on the JupyterHub instance on the Hub (linked to your user account) or locally with a suitably set up environment.

In [None]:
# Imports
import os

# Import the Python API Client
import pyeodh

import folium
import xarray as xr

import hvplot.xarray  # Needed for xarray plotting with geoviews
import holoviews as hv
import datashader as ds
from holoviews import opts
from holoviews.operation.datashader import rasterize
import cartopy.crs as ccrs  # For coordinate reference systems
import geoviews as gv

import requests
from IPython.display import Image, display

# Parameterise geoviews
gv.extension("bokeh", "matplotlib")

# Accessing Different DataTypes

For this part of the workshop exercises we are going to look at how to visualise two different datasets. This should give you the foundation and understanding about how you could do this for other data held on the Hub and elsewhere. The first thing we need to do is connect to the EODH and get information about the collections we are interested in: cmip6 and the sentinel 2 ARD.

In [None]:
# Connect to the Hub
# Change base_url to point to the server that you want to connect to

client = pyeodh.Client(
    base_url="https://staging.eodatahub.org.uk"
).get_catalog_service()

catalog = client.get_catalog("supported-datasets/catalogs/ceda-stac-catalogue")

# Get each collection
cmip6 = catalog.get_collection("cmip6")
sentinel2_ard = catalog.get_collection("sentinel2_ard")

We can interrogate the temporal and spatial extents of the data holdings. Below, we have printed out one of each for each dataset that we have connected to. This can help us understand what date ranges or spatial extents we can use when visualising the data.

In [None]:
# Get the collection metadata
extent = ['spatial', 'temporal']

print("Dataset extent - CMIP6: ", cmip6.extent.to_dict()[extent[1]])
print("Dataset extent - Sentinel 2 ARD: ", sentinel2_ard.extent.to_dict()[extent[0]])

As this is an exercise, we have already chosen some data to look at.

The CMIP6 item has been generated using the CIESM model, running the high-emission SSP5-8.5 scenario. It contains monthly-averaged upward shortwave radiation at the surface (rsus) on a regular grid.

The Sentinel 2 ARD image covers an area acros sthe south of England, including the Solent. 

In [None]:
# Look for a specific item
cmip6_item = cmip6.get_item(
    "CMIP6.ScenarioMIP.THU.CIESM.ssp585.r1i1p1f1.Amon.rsus.gr.v20200806"
)
sentinel_item = sentinel2_ard.get_item(
    "neodc.sentinel_ard.data.sentinel_2.2023.11.17.S2A_20231117_latn509lonw0008_T30UXB_ORB137_20231117131218_utm30n_osgb"
)

In [None]:
# Get information and links to the item assets held in STAC

print("---"*20, "CMIP6", "---"*20)

print(cmip6_item.assets)

print("")
print("---"*20, "SENTINEL 2", "---"*20)

print(sentinel_item.assets)


We can then use the information above to gather links to the asset data

In [None]:
cmip6_kerchunk_asset = cmip6_item.assets["reference_file"]
sentinel2_ard_cog_asset = sentinel_item.assets["cog"]

# print the href
print(cmip6_kerchunk_asset.href)
print(sentinel2_ard_cog_asset.href)

# Manipulating and plotting data 

## Plot a single timestep

For the next part of this exercise we will undertake the following steps:
 - Access our CMIP6 Dataset
 - Extract the dataset (Kerchunk)
 - Select a single timestep
 - Plot resulting image

In [None]:
# Get information on the cloud product
product = cmip6_item.get_cloud_products()
product

We can see that this item contains a single product, a kerchunk reference file which allows us to access the dataset as a single object. The format doesn't actually matter here, as the tools we are using here allows us to open any recognised format into xarray with minimal understanding of the data format on the user side.

The following code cell opens the file as an xarray dataset and presents the user with information about the array structure.

In [None]:
ds = product.open_dataset()
ds

Looking at the output above we can see that the dataset has four `Data variables`. We can perform various selections on the data but specifically we need to choose the variable containng the data to plot, and then select the time step we are interested in. We will use the `geoviews` package to plot the resulting map.

In [None]:
data_var = list(ds.data_vars)[2]  # Choose the data variable (modify if needed)
print("Variable: ", data_var)

timestep = 555 # choose any number within the range presented in the xarray oiutput i.e. up to 1032 in this case


In [None]:
selected_data = ds[data_var].isel(time=timestep)  # Select the data for the time step

# Plot the data
plot = selected_data.hvplot.quadmesh(
    x='lon', y='lat', cmap='plasma', geo=True, 
    projection=ccrs.PlateCarree(), coastline=True, 
    title=f"Time Step: {ds.time.values[timestep]}"
)
plot


## Plot an NDVI Analysis 

Next we will use the Sentinel 2 ARD dataset to create a simple normalised difference vegetation index (NDVI) - the 'Hello World' of the Earth observation sector! 

NDVI is a widely used metric that assesses vegetation health and density by measuring the difference between near-infrared (NIR) and red light reflectance. Healthy vegetation absorbs most red light for photosynthesis while reflecting NIR, resulting in high NDVI values (close to +1). Sparse or unhealthy vegetation has lower values, while barren areas like water or built-up surfaces have values near zero or negative. 

We will undertake the following steps:
 - Access a Sentinel 2 ARD Dataset.
 - Open the dataset (COG) file.
 - Perform calculation for NDVI analysis over an AOI.
 - Plot resulting image.

We can access the selected search result as before, and open the cloud product which this time is a Cloud Optimised GeoTiff (COG) file.


In [None]:
product = sentinel_item.get_cloud_products()
print(product)
ds1 = product.open_dataset()
ds1

We are using a very simple area of interest (AOI) for this exercise - we define the AOI as slice ranges to apply to the xarray DataArray.

There are other spatially contiguous methods for defining AOIs that require handling of projections and spatial file formats that are out of scope for this exercise. These would all work using the data we are manipuulating here. 

In [None]:
# Create a data slice to act as an AOI
crop_ds1 = ds1.isel(x=slice(1200, 2200), y=slice(2200, 3200))

In [None]:
# Extract the red and nir bands for the AOI
red = crop_ds1.isel(band=2)
nir = crop_ds1.isel(band=6)

# Calculate NDVI
ndvi = (nir - red) / (nir + red)

In [None]:
# Plot the NDVI output
# Note: clim sets the colourmap limits to allow a data stretch
plot = ndvi.hvplot.quadmesh(
    x='x', y='y', cmap='RdYlGn', colorbar=True, clim=(0, 1), title="Vegetation (NDVI) Plot"
)
plot

# Streaming data using Titiler

In this section we will investigate how to access tiled data that is being served by `titiler`. Titiler is a lightweight, fast and customisable dynamic tile server for geospatial data, built on FastAPI and Rasterio. It enables on-the-fly serving of raster tiles, mosaics, and previews from cloud-optimized GeoTIFFs (COGs) and other raster sources. Designed for scalability and efficiency, Titiler supports advanced features like dynamic tiling, custom styling, and integration with modern web mapping applications. It is increasingly used in geospatial analysis, remote sensing, and web-based GIS applications.

## Using Preview with COGs

Here we can use the `Preview` feature to customise some of the parameters we are going to use to dynamically visualise the data.

- `bidx` is the band index we want to preview. In this case, we are previewing the first band.
- `rescale` is the range of values we want to display. In this case, we are rescaling the values between 9 and 255. 
- `colormap_name` is the name of the colormap we want to use. In this case, we are using the `rain_r` colormap.

For testing, experiment with different colormaps, here's a few to try:

`'accent',  'afmhot', 'afmhot_r', 'algae', 'algae_r', 'amp_r', 'autumn', 'balance', 'binary', 'binary_r', 'blues',  'bone', 'bone_r', 'hot', 'viridis', ...`

In [None]:
# Set up the parameters 
ENVIRONMENT = 'staging'

COG_PREVIEW_URL = f'https://{ENVIRONMENT}.eodatahub.org.uk/titiler/core/cog/preview'
COG_PREVIEW_PARAMS = {
    'url': sentinel2_ard_cog_asset.href,
    'bidx': 1,
    'rescale': '9,255',
    'colormap_name': 'rain_r'
}

# Request the titiler preview
response = requests.get(COG_PREVIEW_URL, params=COG_PREVIEW_PARAMS)

# Display the image
image = Image(response.content)
display(image)

## Using Preview with Kerchunk

Lets follow a similar approach for the xarray data. This xarray data is a Kerchunk dataset - remember that it contains a single variable called `rsus`. We can preview the data similarly to how we previewed the COG data.

This time, focus on the `rescale` parameter. This parameter is used to rescale the values of the data. In this example we are rescaling between `-50,100`, but try something new like `50,400` and see how the data changes.

In [None]:
# Set up the parameters
ENVIRONMENT = 'staging'

XARRAY_PREVIEW_URL = f'https://{ENVIRONMENT}.eodatahub.org.uk/titiler/xarray/tiles/0/0/0'
XARRAY_PARAMS = {
    'url': cmip6_kerchunk_asset.href,
    'variable': 'rsus',
    'rescale': '-50,100',
    'colormap_name': 'plasma',
    'reference' : 'true'
}

# Build the request
constructed_url = f'{XARRAY_PREVIEW_URL}?url={cmip6_kerchunk_asset.href}&variable=rsus&rescale=35,400&colormap_name=plasma&reference=true'
print(constructed_url)

# Request the titiler preview
response = requests.get(XARRAY_PREVIEW_URL, params=XARRAY_PARAMS)

# Display the image
image = Image(response.content)
display(image)

## Exploring the Data Using OGC XYZ Tiles

Now that we have previewed the datasets, we can use third party tools to also explore the data. To do this, we are going to create an `XYZ` tile endpoint for both the COG data and the Kerchunk data. This can then be used in any third-party tool that supports XYZ tiles e.g. QGIS, OpenLayers, Leaflet, etc.

The following code cell builds the URLs we require based on some of the variables that we have already set. It then prints the two URLS that we will need to display andvisualise the data.

In [None]:
# Generate XYZ For COG
COG_OGC_URL = 'https://' + ENVIRONMENT + '.eodatahub.org.uk/titiler/core/cog/tiles/WebMercatorQuad/{z}/{x}/{y}'
COG_XYZ = COG_OGC_URL + '?' + '&'.join([f'{k}={v}' for k, v in COG_PREVIEW_PARAMS.items()])

# Generate XYZ For Xarray
XARRAY_OGC_URL = 'https://' + ENVIRONMENT + '.eodatahub.org.uk/titiler/xarray/tiles/WebMercatorQuad/{z}/{x}/{y}'
XARRAY_XYZ = XARRAY_OGC_URL + '?' + '&'.join([f'{k}={v}' for k, v in XARRAY_PARAMS.items()])

print('Cog XYZ: ', COG_XYZ)
print('Kerchunk XYZ', XARRAY_XYZ)

We can then use a map viewer package such as `folium` to ingest the `titiler` URL and display it in an interactive way. We do this below for both datasets.

In [None]:
# For the Sentinel dataset
m = folium.Map(location=[54.5, -4.5], zoom_start=6)

# Add the TiTiler layer
folium.raster_layers.TileLayer(
    tiles=COG_XYZ,
    attr="TiTiler",
    name="Sentinel 2 ARD Scene",
    overlay=True
).add_to(m)

# Add a layer control
folium.LayerControl().add_to(m)

# Display the map
m

In [None]:
#For the CMIP6 dataset
m = folium.Map(location=[0, 0], zoom_start=2)

# Add the TiTiler layer
folium.raster_layers.TileLayer(
    tiles=XARRAY_XYZ,
    attr="TiTiler",
    name="CMIP6 Surface Upwelling Shortwave Radiation (W m-2)",
    overlay=True
).add_to(m)

# Add a layer control
folium.LayerControl().add_to(m)

# Display the map
m

In this notebook, we have seen how to access and visualise different data types in different ways.

Extra functionality includes the ability to visualise your own Workspace data by passing in your Workspace API key (that you can generate from the `Workspace` tab). You can then visualise your own data on a map.

This means you can visualise any of your workflow outputs, commercial data or any other visualisable data you have in your Workspace.

An example request would be:
```python
TITILER_PREVIEW_URL = f'https://{environment}.eodatahub.org.uk/titiler/core/cog/preview'
TITILER_PREVIEW_PARAMS = {
    'url': WORKFLOW_OUTPUT_ASSET,
    'bidx': 1,
    'rescale': '9,255',
    'colormap_name': 'rain_r'
}

response = requests.get(TITILER_PREVIEW_URL, params=TITILER_PREVIEW_PARAMS, headers={'Authorization': f'Bearer {WORKSPACE_API_KEY}'}) # Note the headers parameter
```

Here we are passing the `WORKSPACE_API_KEY` in the headers of the request. This will allow you to visualise your own private data straight from the Platform.