# HLS - DEV

Steps:

* scene selection - TODO
* convert HDF to TIF
* creat clear-sky band from QA 

In [23]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

import geopandas as gpd
import pandas as pd
from pathlib import Path
import rasterio

import nasa_hls

from src import configs
prjconf = configs.ProjectConfigParser()
tilenames = prjconf.get("Params", "tiles").split(" ")

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Scene selection - TODO

**TODO: Find a solution for the following problem. By now we just process all downloaded files.***

Here we first downloaded all the data to get the cloud cover and spatial coverage.

Now we only want to process scenes which have a cloud cover lower than and a spatial coverage higher than a specific thresold.

To keep this easy with snakemake, i.e. to be able to work with simple wildcards, we first create a rule which creates the output directories of the datasets we want to create.
Then we can use the output directories as input to the actual processing rule.

(**Note:** There is a section about [data-dependent conditional execution](https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#data-dependent-conditional-execution) which might (?) also be an option.)

In [24]:
tile = "32UNU"

basedir__hls_tif = prjconf.get_path("Interim", "hls") / tile

prjconf = configs.ProjectConfigParser()

df_scenes = pd.read_csv(prjconf.get_path("Raw", "hls_tile_lut", tile=tile))

max_cloud_cover = prjconf.get("Params", "max_cloud_cover")
min_spatial_cover = prjconf.get("Params", "min_spatial_cover")


df_scenes_sel = df_scenes[(df_scenes["cloud_cover"] <= float(max_cloud_cover)) & \
                          (df_scenes["cloud_cover"] >= float(min_spatial_cover))]
for sid in df_scenes_sel["sceneid"]:
    basedir_hls_scene = basedir__hls_tif / sid
    if not (basedir_hls_scene).exists():
        basedir_hls_scene.mkdir()

## Convert HDF to TIF

An HDF file is a stack of bands. 
Each HDF will be converted to several single layer TIF files.

Input file with wildcards and example:

    data/raw/hls/{tile}/{sceneid}.hdf
    
    data/raw/hls/32UNU/HLS.L30.T32UNU.2018003.v1.4.hdf

Output files with wildcards and examples:

    data/interim/hls/{tile}/{sceneid}/{sceneid}__{band}.tif
    
    data/interim/hls/32UNU/HLS.L30.T32UNU.2018003.v1.4/HLS.L30.T32UNU.2018003.v1.4__Red.tif
    data/interim/hls/32UNU/HLS.L30.T32UNU.2018003.v1.4/HLS.L30.T32UNU.2018003.v1.4__NIR.tif
    data/interim/hls/32UNU/HLS.L30.T32UNU.2018003.v1.4/HLS.L30.T32UNU.2018003.v1.4__SWIR1.tif
    data/interim/hls/32UNU/HLS.L30.T32UNU.2018003.v1.4/HLS.L30.T32UNU.2018003.v1.4__SWIR2.tif
    data/interim/hls/32UNU/HLS.L30.T32UNU.2018003.v1.4/HLS.L30.T32UNU.2018003.v1.4__QA.tif

However it is easier to specify just the output folder as ouput.

    directory("data/interim/hls/{tile}/{sceneid}")
    
    data/interim/hls/32UNU/HLS.L30.T32UNU.2018003.v1.4

**TODO**: Change ouput to specify all files. How to do this correctly?

Rule developement.

In [22]:
from pathlib import Path
from nasa_hls import convert_hdf2tiffs

hdf_path = prjconf.get_path("Raw", "hls") / "32UNU" / "HLS.L30.T32UNU.2018003.v1.4.hdf" 
# => in the script: snakemake.input[0]
bands = prjconf.get("Params", "bands").split(" ") 
# => in the script: snakemake.params.bands
dir__hls_tif = prjconf.get_path("Interim", "hls") / "32UNU" / "HLS.L30.T32UNU.2018003.v1.4" 
# => in the script: snakemake.output[0]

convert_hdf2tiffs(hdf_path=hdf_path, 
                  dstdir=Path(dir__hls_tif).parent, 
                  bands=bands, 
                  max_cloud_coverage=100)

PosixPath('/home/ben/Devel/Projects/classify-hls/data/interim/hls/32UNU/HLS.L30.T32UNU.2018003.v1.4')

## Creat clear-sky band

More details on the conversion from QA-layer to clear sky masks can be found in [nasa_hls’s documentation - Create a clear-sky mask from the QA layer](https://benmack.github.io/nasa_hls/build/html/tutorials/Working_with_HLS_datasets_and_nasa_hls.html#Create-a-clear-sky-mask-from-the-QA-layer).


Input file with wildcards and example:

    data/interim/hls/{tile}/{sceneid}/{sceneid}__QA.tif
    
    data/interim/hls/32UNU/HLS.L30.T32UNU.2018003.v1.4/HLS.L30.T32UNU.2018003.v1.4__QA.tif

Output files with wildcards and examples:

    data/interim/hls/{tile}/{sceneid}/{sceneid}__CLEAR.tif
    
    data/interim/hls/32UNU/HLS.L30.T32UNU.2018003.v1.4/HLS.L30.T32UNU.2018003.v1.4__CLEAR.tif
    
Rule developement.

In [29]:
from nasa_hls import hls_qa_layer_to_mask

path__qa = prjconf.get_path("Interim", "hls") / "32UNU" / "HLS.L30.T32UNU.2018003.v1.4" / "HLS.L30.T32UNU.2018003.v1.4__QA.tif"
# => in the script: snakemake.input[0]
path__clear = prjconf.get_path("Interim", "hls") / "32UNU" / "HLS.L30.T32UNU.2018003.v1.4" / "HLS.L30.T32UNU.2018003.v1.4__CLEAR.tif"
# => in the script: snakemake.output[0]

# valid ids from the link above
valid = [  0,   4,  16,  20,  32,  36,  48,  52,  64,  68,  80,  84,  96,
         100, 112, 116, 128, 132, 144, 148, 160, 164, 176, 180, 192, 196,
         208, 212, 224, 228, 240, 244]

clear = hls_qa_layer_to_mask(qa_path=path__qa,
                             qa_valid=valid,
                             keep_255=True,
                             mask_path=path__clear,
                             overwrite=False)
assert clear == 0

Processing skipped. File exists.
