# Coastal turbidity visulisation   <img align="right" src="../Supplementary_data/dea_logo.jpg">

* [**Sign up to the DEA Sandbox**](https://docs.dea.ga.gov.au/setup/sandbox.html) to run this notebook interactively from a browser
* **Compatibility:** Notebook currently compatible with both the `DEA Sandbox` environment only
* **Products used:** 
[ga_s2am_ard_3](https://explorer.sandbox.dea.ga.gov.au/products/ga_s2am_ard_3),
[ga_s2bm_ard_3](https://explorer.sandbox.dea.ga.gov.au/products/ga_s2bm_ard_3)

## Background
Turbidity refers to the degree of clarity or the presence of suspended particles in a liquid. 
In the context of water, it is an optical property that quantifies the amount of light scattered by substances within the water sample when illuminated.
Increased turbidity is indicative of a higher intensity of scattered light, which can be caused by various materials such as clay, silt, microscopic organisms, algae, dissolved organic compounds, and other fine inorganic and organic matter. 
Elevated levels of particulate matter often represent increases in pollutants which can have detrimental effects upon ecosystems. [Read more here.](https://www.usgs.gov/special-topics/water-science-school/science/turbidity-and-water#:~:text=Turbidity%20is%20the%20measure%20of,light%2C%20the%20higher%20the%20turbidity.)

Traditional survey methods for measuring turbidity, such as collecting water samples and using Secci disks, can be time-consuming, costly, and limited in terms of spatial coverage and frequency of measurements. 
These methods require manual data collection, laboratory analysis, and on-site deployments. In contrast, remote sensing through satellites offers a cost-effective and efficient alternative. 
Satellite remote sensing provides wide coverage and frequent data acquisition, allowing for a comprehensive assessment of turbidity over large areas.
One widely used remote sensing index for turbidity estimation is the Normalised Difference Turbidity Index (NDTI). 
The NDTI quantifies the difference in reflectance between specific spectral bands, which correlates with suspended sediment and turbidity levels. 
It's important to note that NDTI values do not directly represent defined turbidity values in units such as NTU (Nephelometric Turbidity Units). 
However, by calibrating the NDTI with in-situ turbidity data recorded concurrently during image capture, it becomes possible to establish a correlation between NDTI values and actual turbidity measurements. 
This calibration enables NTU estimation, providing valuable insights into water quality. 
The NDTI equation is defined below: 

\begin{equation}
NDTI = \frac{(Red - Green)}{(RED + Green)}
\end{equation}

### Description
In this example, we create a time series animation depicting a turbidity plume at the Murray Mouth and the surrounding coastline. 
The animations contrast [Sentinel-2](https://sentinel.esa.int/web/sentinel/missions/sentinel-2) imagery taken from early-2022 to mid-2023; this coincides with ['the flood of a generation'](https://www.environment.sa.gov.au/news-hub/news/articles/2023/06/the-flood-of-a-generation), peaking in mid-January 2023, leading to a stark decline in water quality within the vicinity. 
This example demonstrates how to:

1.  Load Sentinel-2 data
2.  Compute a turbidity and water indices (NDTI, NDWI)
3.  Mask pixels which contain land and cloud 
4.  Create time series animations 
5.  Visualise changes in turbidity throughout a floodwater discharge event

***

## Getting started

To run this analysis, run all the cells in the notebook, starting with the "Load packages" cell. 

After finishing the analysis, return to the "Analysis parameters" cell, modify some values (e.g. choose a different location or time period to analyse) and re-run the analysis. There are additional instructions on modifying the notebook at the end.

#### Load packages
Load key Python packages and supporting functions for the analysis.

In [None]:
%matplotlib inline

from IPython.core.display import Video
from IPython.display import Image
import datacube
import matplotlib.pyplot as plt
from datacube.utils import masking

import sys
sys.path.insert(1, "../Tools/")
from dea_tools.plotting import rgb, display_map
from dea_tools.bandindices import calculate_indices
from dea_tools.dask import create_local_dask_cluster
from dea_tools.datahandling import load_ard
from dea_tools.plotting import rgb, xr_animation

# Connect to dask
client = create_local_dask_cluster(return_client=True)

#### Connect to the datacube
Activate the datacube database, which provides functionality for loading and displaying stored Earth observation data.

In [None]:
dc = datacube.Datacube(app='Turbidity_Animated_Timeseries')

#### Select location

The selected latitude and longitude will be displayed as a red box on the map below the next cell. 
This map can be used to find coordinates of other places, simply scroll and click on any point on the map to display the latitude and longitude of that location.

In [None]:
lat_range = (-35.51, -35.73)
lon_range = (138.506, 139.028)

display_map(x=lon_range, y=lat_range)

## Load Sentinel-2 data
The first step in this analysis is to load in Sentinel-2 optical data for the `lat_range`, `lon_range` and the desired `time` range. Note that only the necessary bands have been added to reduce load times. 
The load_ard function is used here to load data that has undergone geometric correction and surface reflection correction, making it ready for analysis.
More information on ARD and the Open data Cube [here.](https://docs.dea.ga.gov.au/notebooks/Beginners_guide/02_DEA.html#Open-Data-Cube)

In [None]:
# Load satellite data from datacube
query = {
    'y': lat_range,
    'x': lon_range,
    "time": ("2022-02-21", "2023-06-01"),
    "measurements": 
        ["nbart_red",
        "nbart_green",
        "nbart_blue",
        "nbart_nir_1",
        "oa_s2cloudless_prob",
        "oa_s2cloudless_mask"],
    "output_crs": "EPSG:3577",
    "resolution": (-30, 30),
}

# Load available data from both Sentinel-2 satellites
ds = load_ard(
    dc=dc,
    products=["ga_s2am_ard_3", "ga_s2bm_ard_3"],
    dask_chunks={},
    min_gooddata=0.85,
    mask_pixel_quality=False,
    group_by="solar_day",
    **query
)

**Examine the data** by printing it in the next cell.
The `Dimensions` argument revels the number of time steps in the data set, as well as the number of pixels in the `x` (longitude) and `y` (latitude) dimensions.

In [None]:
ds

**Print each time step** and it's associated image capture date to familiar yourself with the data.

In [None]:
for i, time in enumerate(ds.time.values):
    print(f"Time step {i}: {time}")

#### Plot a example timestep in true colour
To visualise the data, use the pre-loaded `rgb` utility function to plot a true colour image for a given time-step. 

Change the value for `timestep` to explore the data.


In [None]:
# Set the timestep to visualise
timestep = 20

# Generate RGB plot at the desired timestep
rgb(ds, 
    index=timestep,
    percentile_stretch=(0.05, 0.98))

## Calculate the NDTI index
First, calculate the NDTI index mentioned in the 'Background' section. 
Note this index is not yet available in 'dea_tools.bandindices'.
Then visualise some imagery.

When it comes to interpreting the index, **High values represent pixels with high turbidity**, while **low values represent pixels with low turbidity**. 

In [None]:
# Calculate NDTI
ds["NDTI"] = (ds.nbart_red - ds.nbart_green) / (ds.nbart_red + ds.nbart_green)

In [None]:
# Visualise this
ts = ds["NDTI"].isel(time=timestep)
fig, ax = plt.subplots(figsize=(15, 10))
plt.imshow(ts, cmap="viridis")
plt.colorbar(label="NDTI", shrink=0.64)
plt.show()

## Compute Normalised Difference Water Index
When we applied the turbidity index to the imagery the presence of land pixels within the imagery results in extremely high turbidity values over land thus reducing the sensitivity of the rasters contrast stretch over water. Hence we must mask out pixels associated with land.  

To do this, we can use our Sentinel-2 data to calculate a water index called the 'Normalised Difference Water Index', or [NDWI](https://custom-scripts.sentinel-hub.com/custom-scripts/sentinel-2/ndwi/). 
This index uses the ratio of green and near-infrared radiation to identify the presence of water.
The formula is as follows:

$$
\begin{aligned}
\text{NDWI} &= \frac{(\text{Green} - \text{IR})}{(\text{Green} + \text{IR})}
\end{aligned}
$$

When it comes to interpreting the index, **High values (greater than 0) typically represent water pixels**, while **low values (less than 0) represent land**. 
You can use the cell below to calculate and plot one of the images after calculating the index.

In [None]:
# Calculate the water index
ds = calculate_indices(ds, index="NDWI", collection="ga_s2_3")

#### Visualise how environmental factors impact the NDWI values
Plot a representative sub-sample of `time` steps with a variety of environmental factors (e.g extreme turbidity, low turbidity, cloud, wave break). 

In [None]:
# Plot a representative subset of the dataset (ds)
ds["NDWI"].isel(time=[1, 7, 20, 26]).plot.imshow(cmap="gnuplot", col="time", col_wrap=2, figsize=(12, 6))

**Question** How does extremely turbid water, wave break, and cloud impact upon NDWI values?

## Mask out the land
Experiment with a `landmask_threshold` to isolate just water body pixels and visualise the results.

In [None]:
# Trial land masking thresholds
landmask_threshold = -0.05
dslmask_trial = ds["NDWI"].isel(time=20).where(ds["NDWI"] > landmask_threshold)

In [None]:
# Visualise this
ts = dslmask_trial.isel(time=timestep)
fig, ax = plt.subplots(figsize=(15, 10))
plt.imshow(ts, cmap="gnuplot")
plt.colorbar(label="NDWI", shrink=0.64)
plt.show()

Once an appropriate threshold has been established create a new dataset titled 'ds_lmask'.

In [None]:
# Mask the dataset
ds_lmask = ds.where(dslmask_trial > landmask_threshold)

## Cloud masking with `s2cloudless`
When working with Sentinel-2 data, users can Sentinel-2 specific cloudmask, `s2cloudless`, [a machine learning cloud mask developed by Sinergise's Sentinel-Hub](https://github.com/sentinel-hub/sentinel2-cloud-detector).

We can also easily load and inspect `s2cloudless`'s cloud probability layer that gives the likelihood of a pixel containing cloud.
To do this first decide on a representative sample of the dataset ranging from non-cloud effected imagery to very cloud effected imagery.
Then plot the `s2cloudless` band on the same sample and decide on a making threshold to be applied to the dataset `ds_lmask`, which will then be renamed to `ds_clmask`('c' stands for cloud, and 'l' stands for land).


In [None]:
# Plot a representative sample of ds_lmask rgb
rgb(ds_lmask.isel(time=[0, 9, 5, 3]), percentile_stretch=(0.2, 0.85), col="time", col_wrap=2)

In [None]:
# Plot the same sample of ds_lmask, cloud probability
ds_lmask["oa_s2cloudless_prob"].isel(time=[0, 9, 5, 3]).plot.imshow(cmap="gnuplot", col="time", col_wrap=2, figsize=(12, 6))

**Experiment with the cloud mask**. Change the `cloudmask` threshold and see how it affects the imagery, examine multiple `timestep`'s.
It is not imperative you remove every single cloud effect pixel, if you did so you would likely also be masking out highly turbid pixels.

In [None]:
# Set cloudmask threshold
cloudmask_threshold = 0.98

# Set the timestep to visualise
timestep = 5

# Apply 'cloudmask_mask' to trial dataset 'ds_clmask_tiral'
ds_clmask_trial = ds_lmask["oa_s2cloudless_prob"].where(ds["oa_s2cloudless_prob"] < cloudmask_threshold)

# Visualise this
ts = ds_clmask_trial.isel(time=timestep)
fig, ax = plt.subplots(figsize=(15, 10))
plt.imshow(ts, cmap="gnuplot")
plt.colorbar(label="cloud_probabilty", shrink=0.64)
plt.show()

Now an appropriate threshold has been established create a new dataset titled `ds_clmask`.

In [None]:
# Once certain cloudmask the dataset
ds_clmask = ds_lmask.where(ds_clmask_trial < cloudmask_threshold)

## Visualise and interrogate the imagery
At this point we will plot all the masked data both in RGB and NDTI

Now **both land and cloud have been masked**, these values are now 'null' values. This is problematic as null values appear white, which is the same colour as clouds; this can be a source of confusion when displaying in RGB. 
To rectify this set 'null' values to '0'.

In [None]:
# Manually set the null values equal to zero so they don't appear white like clouds
ds_clmask_zero = ds_clmask.where(~ds_clmask.isnull(), 0)

In [None]:
# Generate RGB plot at the desired timestep
rgb(ds_clmask_zero, 
    index=timestep,
    percentile_stretch=(0.1, 0.975))

**Now plot all the imagery** both in RGB and just the NDTI.

Consider which images you want in your final animations. 
If an image remains too cloud affected or there is severe seamline effects (boundaries between adjacent image tiles, as observed in timesteps 0 and 1), it might be worth removing these images. 
For now make a note of all the images you want to retain.

**This may take some time to plot**.

In [None]:
# Plot the data so you can determine if there are any images you don't want in your animated time series
rgb(ds_clmask_zero, col="time", col_wrap=3, percentile_stretch=(0.1, 0.975))

In [None]:
# Plot the NDTI data with the mask so you can determine if there are any images you don't want in your animated time series
ds_clmask["NDTI"].plot(col="time", col_wrap=3, cmap="viridis")

#### Create final datasets with only the images which will be included in the final animations

In the next two cells create two datasets for two separate purposes.
**'ds_final_imgs'** will be utilised to create animations of both the NDTI over only waterbodies, and a RGB animations with the land and cloud masked to increase the contrast stretch over the waterbodies. 
Create another dataset **ds_final_images_no_mask** without the final mask to display the mosaicked imagery simply in RGB in their original state. 
In this case it is important that both datasets contain the same timesteps for ease of comparison. 

In [None]:
# Create a dataset with only the desired time steps to be utilised in an animation when a cloud and land mask is needed.
ds_final_imgs = ds_clmask.isel(time=[2,5,6,7,8,9,13,14,15,16,17,18,19,20,21,22,23,24,25,28])

In [None]:
# Create a dataset with only the desired time steps to be utilised in an animation when no mask is needed.
ds_final_imgs_no_mask = ds.isel(time=[2,5,6,7,8,9,13,14,15,16,17,18,19,20,21,22,23,24,25,28])

**Print both datasets** familiarise yourself with the data and ensure the timesteps are equal for both datasets.

In [None]:
# Print and investigate the index of each time step in ds_final_imgs_no_mask
for i, time in enumerate(ds_final_imgs_no_mask.time.values):
    print(f"Time step {i}: {time}")

In [None]:
# Print and investigate the index of each time step in ds_final_imgs
for i, time in enumerate(ds_final_imgs.time.values):
    print(f"Time step {i}: {time}")

## Produce the animations
Create animations with `xr_animation` for:

    1. RGB unmasked imagery
    2. RGB land and cloud masked, and contrast stretched
    3. NDTI land and cloud masked imagery

Consider adapting `annotation_kwargs` to change the typography of the label.

`Show_text` will adapt the text of the label.

`interval` will alter the amount  at which images are displayed (units = milliseconds).

`width_pixels` changes the size of the display.


In [None]:
# Produce time series animation of RGB (unmasked)
xr_animation(ds=ds_final_imgs_no_mask,
             bands=['nbart_red', 'nbart_green', 'nbart_blue'],
             output_path='RGB_Murray_Mouth_Plume.mp4',
             annotation_kwargs={'fontsize': 40, 'color':'white'}, 
             show_text='RGB',
             interval=2000,
             width_pixels=1300)

# Plot animation
plt.close()
Video('RGB_Murray_Mouth_Plume.mp4', embed=True)

In [None]:
# Produce time series animation of RGB with land and cloud masked and contrast stretched.
xr_animation(ds=ds_final_imgs,
             bands=['nbart_red', 'nbart_green', 'nbart_blue'],
             percentile_stretch=(0.1, 0.975),
             output_path='RGB_Land_Masked_Murray_Mouth_Plume.mp4',
             show_text='RGB contrast stretched - Land Masked',
             annotation_kwargs={'fontsize': 40}, 
             interval=2000,
             width_pixels=1300)

# Plot animation
plt.close()
Video('RGB_Land_Masked_Murray_Mouth_Plume.mp4', embed=True)

In [None]:
# Produce time series animation of NDTI land and cloud masked.
xr_animation(ds=ds_final_imgs,
             output_path='NDTI_Murray_Mouth_Plume.mp4',
             bands='NDTI',
             show_text='NDTI - Land Masked',
             interval=2000,
             width_pixels=1300,
             annotation_kwargs={'fontsize': 40, 'color':'black'},
             imshow_kwargs={'cmap': 'viridis'},)

# Plot animation
plt.close()
Video('NDTI_Murray_Mouth_Plume.mp4', embed=True)

## Drawing conclusions

Here are some questions to think about:

* What can you conclude about the magnitude and timing of the Murray River flood discharge? 
* Which portions of the coast were most affected by this event? 
* What are some of the limitations of this method of assessing river-plume turbidity? 

## Next steps
When you are done, return to the "Set up analysis" and "Load Sentinel-2 data" cells, modify some values (e.g. `time`, `lat_range`/`lon_range`) and rerun the analysis. 

If you change the location, you'll need to make sure Sentinel-2 data is available for the new location, which you can check at the [DEA Explorer](https://explorer.sandbox.dea.ga.gov.au/products/ga_s2am_ard_3) (use the drop-down menu to view all Sentinel-2 products). 
