## Stitching

Earth Observation data is usually distributed in small scene files, with different projections, resolutions, etc. across data providers. eg: s3://modis-pds/, s3://sentinel-s2-l1c/.

Before using this data it's important to combine different scene files together, re-arrange or combine bands or change the projection and resolution of the dataset.

This module aims to make it easy for anyone to download and combine EO scene files, re-arrange bands and change projection or resolution.

The main class implemented in `dataset`.

### Using a grid file
1.  Grid file is a kml or shapefile which contains a mapping of data provider's grid to world coordinates. 

2. It contains a `Name` key which contains a mapping of x and y to bounding boxes. This file can be used to create a fixed list of patterns, which then is used to search for scene files.

This is the advised method

In [None]:
# Imports
import geopandas as gpd
import fiona
import re
import datetime

fiona.drvsupport.supported_drivers["kml"] = "rw"
fiona.drvsupport.supported_drivers["KML"] = "rw"

In [None]:
grid_fp = "../spacetime_tools/stitching/sample_data/sample_kmls/modis.kml"

gdf = gpd.read_file(grid_fp)

In [None]:
print (gdf)

In [None]:
# Before we go ahead we need to define a lambda function which can extract h and v from the GeoDataFrame in a dictionary
# Example below
def fn(x):
    match = re.search(r"h:(\d*) v:(\d*)", x.Name)
    if match and match.groups():
        vars = match.groups()
        return {
            "x": f"{int(vars[0]):02d}",
            "y": f"{int(vars[1]):02d}",
        }

In [None]:
# If you run the above function for a single tuple it will return a dictionary as below
for df_row in gdf.itertuples():
    print (fn(df_row))

In [None]:
# Now we are ready to stich our scenes together
# As an example we will use Albania's bounding box and get data for the month of January 2017 from s3://modis-pds/MCD43A4.006/

# Modis files are daily so we will get one file per day
bbox = (19.3044861183, 39.624997667, 21.0200403175, 42.6882473822)
date_range = (datetime.datetime(2017, 1, 1), datetime.datetime(2017, 1, 31)) # End date is inclusive

In [None]:
# Then we set the source path and initialize our dataset
# Source is usually made up of three components and is derived from the full path of a single scene file
# Eg: Path of single scene file
# s3://modis-pds/MCD43A4.006/01/08/2013160/MCD43A4.A2013160.h00v08.006.2016138043045_B07.TIF
# In the above path 
# 01 corresponds to h in grid file
# 08 corresponds to y in grid file
# 2013160 can be denoted as %Y%j according to python's standard format codes below
# https://docs.python.org/3/library/datetime.html#format-codes
# And finally the full file name can be represented as *_B07.TIF telling that we module that we want to download Band 7.

# More details below
# https://docs.opendata.aws/modis-pds/readme.html


# Source variable then becomes
source = "s3://modis-pds/MCD43A4.006/{x}/{y}/%Y%j/*_B0?.TIF"

In [None]:
# Modis Data is at a daily frequency so we create one COG per day
destination = "/Volumes/Data/spacetime-tools/final/modis-pds/%d-%m-%Y-b07.TIF"

In [None]:
# Importing the dataset class from spacetime_tools.stitching
from spacetime_tools.stitching.classes import dataset

In [None]:
# AWS REGION
region = "us-west-2"

In [None]:
# This initializes the dataset object
ds = dataset.DataSet("modis-pds", "s3", source, overwrite=False)

# Setting time bounds
ds.set_timebounds(date_range[0], date_range[1])

# Setting spatial bounds
ds.set_spacebounds(bbox, grid_fp, fn)

In [None]:
# Getting distinct bands
bands = ds.get_distinct_bands()

In [None]:
# Downloading scene files
ds.sync()

In [None]:
# Finally stitching them together with the band arrangement as below
ds.to_cog(
        destination,
        bands=[
            "Nadir_Reflectance_Band1",
            "Nadir_Reflectance_Band3",
            "Nadir_Reflectance_Band7",
        ],
    )