# HLS Quality assessment notebook for QA-masked monthly median per pixel reflectance

This notebook does basic quality assessment of a single tile (10TET) covering Seattle in 2018 by examining the output zarr of the `calculate_job_median` function. It specifically examines the output for the NIR_NARROW band in January and July.

## Get the basics set up

In [1]:
import os

# pip/conda installed
import fsspec
import pandas as pd
import xarray as xr

from utils import get_logger
from utils.hls.catalog import HLSBand
from utils.hls.catalog import HLSCatalog
from utils.hls import compute

In [2]:
logger = get_logger('hls-masked-monthly-median-qa')

In [3]:
# fill with your account key
os.environ['AZURE_ACCOUNT_KEY'] = ""

In [4]:
code_path = './utils'
cluster_args = dict(
    workers=4,
    worker_threads=1,
    worker_memory=8,
    scheduler_threads=1,
    scheduler_memory=4
)

# read the entire data once (each tile is 3660x3660)...
chunks = {'band': 1, 'x': 3660, 'y': 3660}

## Get the catalog and cluster ready

We'll use a local cluster and a 2018 catalog for the tile covering Seattle (10TET)

In [5]:
bands = [HLSBand.NIR_NARROW, HLSBand.QA]
seattle_df = pd.DataFrame([{'lat': 47.6062, 'lon': -122.3321, 'year': 2018}])
catalog = HLSCatalog.from_point_pandas(seattle_df, bands)
catalog.xr_ds

Reading tile extents...
Read tile extents for 56686 tiles


## Run the Job

This should take a minute or two to complete

In [6]:
account_name="usfs"
storage_container="fia/hls/qa"
account_key=os.environ["AZURE_ACCOUNT_KEY"]
catalog_groupby = "tile"
job_groupby = "time.month"

In [None]:
compute.process_catalog(
    catalog=catalog.xr_ds,
    catalog_groupby=catalog_groupby,
    job_fn=compute.calculate_job_median,
    job_groupby=job_groupby,
    chunks=chunks,
    account_name=account_name,
    storage_container=storage_container,
    account_key=account_key,
    checkpoint_path='qa_chk_pt.txt',
    logger=logger,
    cluster_args=cluster_args,
    code_path=code_path,
    concurrency=1,  # run 1 job at once
    cluster_restart_freq=-1  # don't restart cluster
)

2021-01-22 20:40:31,813 [INFO] hls-masked-monthly-median-qa - Starting cluster
2021-01-22 20:40:40,691 [INFO] hls-masked-monthly-median-qa - Cluster dashboard visible at /services/dask-gateway/clusters/default.8b3b8469d16b49eaa7f0b309eae7158d/status
2021-01-22 20:40:40,718 [INFO] hls-masked-monthly-median-qa - Uploading code to cluster
2021-01-22 20:40:40,721 [INFO] hls-masked-monthly-median-qa - Submitting job 10TET


## Quality assessment

1. Read the file from Azure blob storage
1. Examine the NIR_NARROW band in July and assert the reflectance values are within bounds and then visually examine the output
1. Examine the NIR_NARROW band in January and assert the reflectance values are within bounds and then visually examine the output

In [None]:
tile_path = fsspec.get_mapper(
    "az://fia/hls/qa/10TET.zarr",
    account_name="usfs",
    account_key=os.environ['AZURE_ACCOUNT_KEY']
)
tile = xr.open_zarr(tile_path)
tile

In [None]:
def assert_bounds(arr, minimum, maximum):
    mi = float(arr.min().values)
    ma = float(arr.max().values)
    print(mi, ma)
    assert mi >= minimum and ma <= maximum, "Out of bounds"

In [None]:
# July in Seattle...Shouldn't be missing too much data
july = tile.sel(month=7)['NIR_NARROW'].persist()
assert_bounds(july, 0, 1)
july.fillna(0).plot.imshow(size=10)

In [None]:
# January in Seattle...should have a lot of missing data
jan = tile.sel(month=1)['NIR_NARROW'].persist()
assert_bounds(jan, 0, 1)
jan.fillna(-1).plot.imshow(size=10)