# USGS Landsat collection 2 using datacube <img align="right" src="../resources/csiro_easi_logo.png">

#### Index
- [Overview](#Overview)
- [Setup (dask, imports, query)](#Setup)
- [Product definition (measurements, flags)](#Product-definition)
- [Quality layer (mask)](#Quality-layer)
- [Scaling and nodata](#Scaling-and-nodata)
- [Visualisation](#Visualisation)
- [Appendix](#Appendix)

## Overview

This notebook shows how to search the search the local ODC database index of Landsat data and import selected items into an xarray Dataset. Scaling and pixel quality masking are applied using datacube functions.

## USGS Landsat

Landsat-5, 7, 8 and 9 [collection 2](https://www.usgs.gov/landsat-missions/landsat-collection-2) products are managed by USGS. USGS make Landsat data available via number of services, including:

- [Earth Explorer](https://earthexplorer.usgs.gov) - USGS data browser and viewer
- [Landsat Look](https://landsatlook.usgs.gov/stac-browser/collection02) - Landsat scene browser (STAC)
- [AWS OpenData](https://registry.opendata.aws/usgs-landsat) - Cloud-hosted data (STAC)
- [ESPA](https://espa.cr.usgs.gov) - USGS On-demand processing
- Google Earth Engine
- Microsoft Planetary Computer

#### Data source and documentation

[Surface Reflectance](https://www.usgs.gov/landsat-missions/landsat-collection-2-level-2-science-products), [Surface Temperature](https://www.usgs.gov/landsat-missions/landsat-collection-2-level-2-science-products) and [Level-1 (top of atmosphere)](https://www.usgs.gov/landsat-missions/landsat-collection-2-level-1-data) products for each of [Landsat-5, 7, 8 and 9](https://www.usgs.gov/landsat-missions/landsat-satellite-missions) are available.

## Open Data Cube

EASI Asia ODC product names ([Explorer](http://explorer.asia.easi-eo.solutions/products)):
| Name | Product | Information
|--|--|--|
| USGS Landsat surface reflectance | landsat5_c2l2_sr | Landsat 5 Collection 2 Level-2 Surface Reflectance Product. 30m UTM based projection |
| | landsat7_c2l2_sr | Landsat 7 USGS Collection 2 Surface Reflectance, processed using LEDAPS. 30m UTM based projection |
| | landsat8_c2l2_sr | Landsat 8 Collection 2 Surface Reflectance, processed using LaSRC. 30m UTM based projection |
| | landsat9_c2l2_sr | Landsat 9 Collection 2 Surface Reflectance, processed using LaSRC. 30m UTM based projection |
| USGS Landsat surface temperature | landsat5_c2l2_st | Landsat 5 Collection 2 Level-2 UTM Surface Temperature (ST) Product |
| | landsat7_c2l2_st | Landsat 7 Collection 2 Level-2 UTM Surface Temperature (ST) Product |
| | landsat8_c2l2_st | Landsat 8 Collection 2 Level-2 UTM Surface Temperature (ST) Product |
| | landsat9_c2l2_st | Landsat 9 Collection 2 Level-2 UTM Surface Temperature (ST) Product |
| USGS Landsat Level-1 (TOA) | landsat8_c2l1 | Landsat 8 Collection 2 Level-1 (top of atmosphere) |
| | landsat9_c2l1 | Landsat 9 Collection 2 Level-1 (top of atmosphere) |

#### EASI pipeline

Landsat products are read from the [AWS STAC API](https://landsatlook.usgs.gov/stac-server/). The data are in COG format. Two methods are shown in this notebook, each returns an essentially equivalent `xarray Dataset`:
1. Read from the STAC catalog (uses [odc-stac](https://github.com/opendatacube/odc-stac))
1. Read from the `datacube` database, which has a "cached" copy of the scenes and metadata (uses [odc-tools](https://github.com/opendatacube/odc-tools/blob/develop/apps/dc_tools/odc/apps/dc_tools/stac_api_to_dc.py))

Notes for using the AWS STAC API:
- Requires `requester_pays = True`
- AWS source region is `us-west-2` (consider egress and latency)
- Use EASI `caching-proxy` settings (to help reduce egress and latency costs)

## Setup

In [None]:
%matplotlib inline

# Data tools
import os, sys
import rasterio
import numpy as np
import xarray as xr
import pandas as pd
from datetime import datetime
import matplotlib.pyplot as plt

# Formatting options
pd.set_option("display.max_rows", None)
# plt.rcParams['figure.figsize'] = [12, 8]

# Datacube
import datacube
from datacube.utils import masking
from dea_tools.plotting import display_map, rgb
from dea_tools.datahandling import mostcommon_crs
from odc.algo import mask_cleanup, erase_bad, to_f32
# from odc.stac import configure_s3_access
from datacube.utils.rio import configure_s3_access

# EASI packages
repo = f'{os.environ["HOME"]}/eocsi-hackathon-2022'  # No easy way to get repo directory
if repo not in sys.path: sys.path.append(repo)
from tools.notebook_utils import xarray_object_size, initialize_dask, localcluster_dashboard

# Holoviews, Datashader and Bokeh
# import hvplot.pandas
# import hvplot.xarray
# import holoviews as hv
# import panel as pn
# import colorcet as cc
# import cartopy.crs as ccrs
# from datashader import reductions
# from holoviews import opts
# import geoviews as gv
# from holoviews.operation.datashader import rasterize
# hv.extension('bokeh', logo=False);

#### Dask local cluster

In [None]:
configure_s3_access(requester_pays=True)

# Optional: use EASI SE Asia caching-proxy service
# Requires patch URL functions for each of Landsat and Sentinel-2
os.environ["AWS_HTTPS"] = "NO"
os.environ["GDAL_HTTP_PROXY"] = "easi-caching-proxy.caching-proxy:80"
print(f'Will use caching proxy at: {os.environ.get("GDAL_HTTP_PROXY")}')

cluster, client = initialize_dask(use_gateway=False, workers=(1,7), wait=False)
if cluster: display(cluster)
if cluster is None or 'LocalCluster' in str(type(cluster)): display(localcluster_dashboard(client))

## Example query

In [None]:
# Vietnam - Ha Long
latitude = (20.7, 21.1)
longitude = (106.7, 107.2)
time=('2020-02-01', '2020-04-01')

# Fiji - blows up JHub memory due to antemeridian
# latitude = (-17.1, -16.2)
# longitude = (178.2, 180.0)
# time=('2020-02-01', '2020-02-20')

# PNG Milne Bay
latitude = (-10.8, -10)
longitude = (149.7, 150.8)  
time=('2018-02-01', '2022-02-20')

# west, south, east, north
bbox = [longitude[0], latitude[0], longitude[1], latitude[1]]
crs = "epsg:3857"

display_map(longitude, latitude)

## 2. Read from the ODC database

In [None]:
# Initialise a datacube connection
dc = datacube.Datacube()

# Select a product
product = 'landsat8_c2l2_sr'

# Split the query to determine the most common CRS (essentially call find_datasets())
query = {
    'x': longitude,    # "x" axis bounds
    'y': latitude,      # "y" axis bounds
    'time': time,           # Any parsable date strings
    'measurements': ['blue', 'green', 'red', 'nir08', 'pixel_quality'], 
    'product': product,                     # Product name
    # 'output_crs': native_crs,               # EPSG code
    'resolution': (30, 30),                 # Target resolution
    'group_by': 'solar_day',                # Scene ordering
    'dask_chunks': {'x': 2048, 'y': 2048},  # Dask chunks
    'skip_broken_datasets': True,
}

simple = {k:v for k,v in query.items() if k in ('x', 'y', 'time')}

# Most common CRS
native_crs = mostcommon_crs(dc, query['product'], simple)
query['output_crs'] = native_crs
print(f'Native CRS: {native_crs}')

# Find datasets
items = dc.find_datasets(product=query['product'], **simple)
print(f"Found: {len(items):d} datasets")
# display(items)

# Load data
data = dc.load(**query)
display(xarray_object_size(data))
display(data)

## Apply cloud masking

Following [Cloud_and_pixel_quality_masking.ipynb](https://github.com/digitalearthafrica/deafrica-sandbox-notebooks/blob/main/Frequently_used_code/Cloud_and_pixel_quality_masking.ipynb), we choose the "cloud mask filtered" method to remove false-positive cloud features before temporal aggregation.

In [None]:
flags_def = data.pixel_quality.attrs["flags_definition"]
flags_def

In [None]:
flags_def = data.pixel_quality.attrs["flags_definition"]

quality_flags = dict(
    cloud="high_confidence", # True where there is cloud
    cirrus="high_confidence",# True where there is cirrus cloud
    cloud_shadow="high_confidence",# True where there is cloud shadow
)

# Set cloud_mask: True=cloud, False=non-cloud
mask, _= masking.create_mask_value(flags_def, **quality_flags)

# Add the cloud mask to our dataset
data['cloud_mask'] = (data['pixel_quality'] & mask) != 0

# Set the filters to apply. The integers refer to the number of pixels
filters = [("opening", 2),("dilation", 2)]

# Use the mask_cleanup function to apply the filters
data['cloud_mask_filtered'] = mask_cleanup(data['cloud_mask'], mask_filters=filters)

# erase pixels with the cloud_filtering
clear_filtered = erase_bad(data.drop_vars(['pixel_quality', 'cloud_mask_filtered', 'cloud_mask']),
                           data['cloud_mask_filtered'])

clear_filtered

## Scaling and nodata

In [None]:
# Convert the data to `float32` and set no-data values to `NaN`:
clear_filtered = to_f32(clear_filtered)

# Apply the scale and offset factors to the Landsat data
clear_filtered = 2.75e-5 * clear_filtered - 0.2

## Annual median NDVI

In [None]:
clear_filtered['ndvi'] = (clear_filtered.nir08 - clear_filtered.red)/(clear_filtered.nir08 + clear_filtered.red)

annual = clear_filtered.ndvi.groupby("time.year").median().persist()

In [None]:
red = clear_filtered.red.groupby("time.year").median().persist()
red.plot(col="year", cmap="YlGn", col_wrap=4, vmin=0, vmax=1.0);

In [None]:
# Plot the output summary images (slow)
annual.plot(col="year", cmap="YlGn", col_wrap=4, vmin=0, vmax=1.0);

## Visualisation

In [None]:
# rgb(clear_filtered, bands=['red', 'green', 'blue'], col="time", col_wrap=4)

In [None]:
##
## Drat!
## Geoviews/Cartopy projection from epsg:32649 to PlateCarree doesn't work (works for S2 UTM)
##

# Generate a plot

# options = {
#     'title': f'{query["product"]}: {layer_name}',
#     'width': 1000,
#     'height': 450,
#     'aspect': 'equal',
#     'cmap': cc.rainbow,
#     'clim': (0, 0.05),                          # Limit the color range depending on the layer_name
#     'colorbar': True,
#     'tools': ['hover'],
# }

# # Set the Dataset CRS
# plot_crs = native_crs
# if plot_crs == 'epsg:4326':
#     plot_crs = ccrs.PlateCarree()


# # Native data and coastline overlay:
# # - Comment `crs`, `projection`, `coastline` to plot in native_crs coords
# # TODO: Update the axis labels to 'longitude', 'latitude' if `coastline` is used
    
# layer_plot = layer.hvplot.image(
#     x = 'x', y = 'y',                        # Dataset x,y dimension names
#     rasterize = True,                        # Use Datashader
#     aggregator = reductions.mean(),          # Datashader selects mean value
#     precompute = True,                       # Datashader precomputes what it can
#     crs = plot_crs,                          # Dataset crs
#     projection = ccrs.PlateCarree(),         # Output projection (use ccrs.PlateCarree() when coastline=True)
#     coastline='10m',                         # Coastline = '10m'/'50m'/'110m'
# ).options(opts.Image(**options)).hist(bin_range = options['clim'])

# # display(layer_plot)
# # Optional: Change the default time slider to a dropdown list, https://stackoverflow.com/a/54912917
# fig = pn.panel(layer_plot, widgets={'time': pn.widgets.Select})  # widget_location='top_left'
# display(fig)