![OpenSARlab notebook banner](NotebookAddons/blackboard-banner.png)

# Prepare a HyP3 smallbaseline InSAR stack and convert to a CARD4L compliant NetCDF and/or Zarr Store 
### Alex Lewandowski; Alaska Satellite Facility

<img style="padding: 7px" src="NotebookAddons/UAFLogo_A_647.png" width="170" align="right"/></font>

**This notebook:**
1. Downloads all or part of a HyP3 smallbaseline InSAR stack
1. Projects all scenes to the same EPSG
1. Creates an xarray.Datset containing:
  
    
**This notebook does NOT merge data from the same date because:**
- The stack contains timesteps for acquisition start-times at second-scale precision
- Each scene will have a different acquisition start-time and their data will occupy different timesteps

### Important Note about JupyterHub

**Your JupyterHub server will automatically shutdown when left idle for more than 1 hour. Your notebooks will not be lost but you will have to restart their kernels and re-run them from the beginning. You will not be able to seamlessly continue running a partially run notebook.**

In [None]:
import url_widget as url_w
notebookUrl = url_w.URLWidget()
display(notebookUrl)

In [None]:
from IPython.display import Markdown
from IPython.display import display

notebookUrl = notebookUrl.value
user = !echo $JUPYTERHUB_USER
env = !echo $CONDA_PREFIX
if env[0] == '':
    env[0] = 'Python 3 (base)'
if env[0] != '/home/jovyan/.local/envs/rtc_analysis':
    display(Markdown(f'<text style=color:red><strong>WARNING:</strong></text>'))
    display(Markdown(f'<text style=color:red>This notebook should be run using the "rtc_analysis" conda environment.</text>'))
    display(Markdown(f'<text style=color:red>It is currently using the "{env[0].split("/")[-1]}" environment.</text>'))
    display(Markdown(f'<text style=color:red>Select the "rtc_analysis" from the "Change Kernel" submenu of the "Kernel" menu.</text>'))
    display(Markdown(f'<text style=color:red>If the "rtc_analysis" environment is not present, use <a href="{notebookUrl.split("/user")[0]}/user/{user[0]}/notebooks/conda_environments/Create_OSL_Conda_Environments.ipynb"> Create_OSL_Conda_Environments.ipynb </a> to create it.</text>'))
    display(Markdown(f'<text style=color:red>Note that you must restart your server after creating a new environment before it is usable by notebooks.</text>'))

## 0. Importing Relevant Python Packages

In this notebook we will use the following scientific libraries:

1. [GDAL](https://www.gdal.org/) is a software library for reading and writing raster and vector geospatial data formats. It includes a collection of programs tailored for geospatial data processing. Most modern GIS systems (such as ArcGIS or QGIS) use GDAL in the background.
1. [NumPy](http://www.numpy.org/) is one of the principal packages for scientific applications of Python. It is intended for processing large multidimensional arrays and matrices, and an extensive collection of high-level mathematical functions and implemented methods makes it possible to perform various operations with these objects.

**Our first step is to import them:**

In [None]:
%%capture
import copy
from datetime import datetime, timedelta, timezone
import json # for loads
import math
from pathlib import Path
import re
import shutil
import sys
from tqdm.auto import tqdm 
from typing import Union
import warnings

from ipyfilechooser import FileChooser

import numpy as np
from osgeo import gdal
import pycrs
import s3fs
import xarray as xr
import yaml
import zarr

from IPython.display import display, clear_output, Markdown

import asf_search
import opensarlab_lib as osl
from hyp3_sdk import Batch, HyP3

## 1. Load Your Own Data Stack Into the Notebook

This notebook assumes that you've created an RTC data stack over your personal area of interest using the [Alaska Satellite Facility's](https://www.asf.alaska.edu/) value-added product system HyP3, available via [ASF Data Search (Vertex)](https://search.asf.alaska.edu/#/). HyP3 is an ASF service used to prototype value added products and provide them to users to collect feedback.

We will retrieve HyP3 data via the hyp3_sdk or work with previously downloaded data. As both HyP3 and the Notebook environment sit in the [Amazon Web Services (AWS)](https://aws.amazon.com/) cloud, data transfer is quick and cost effective.

---

If downloading data, create a data directory in which to download dual-pol HyP3 RTC products.

If working with previously downloaded dual-pol HyP3 RTCs, each product should contain VH and VV  or HH and HV data, the HyP3 log file, and the HyP3 product README in subdirectories of the data directory:

```
data_directory   
│
└───product_1_directory
│   │   *_VH.tif
│   │   *_VV.tif
│   │   *.README.md.txt
│   │   *.log
│   ...
│   
└───product_2_directory
│   │   *_VH.tif
│   │   *_VV.tif
│   │   *.README.md.txt
│   │   *.log
│   ...
│ 
...
```

**Select or create a data directory:**

In [None]:
print("Download HyP3 data or work with previously downloaded data?")
data_source = osl.select_parameter(['Download data from HyP3', 'Use existing data'])
display(data_source)

In [None]:
download = 'Download' in data_source.value
if download:
    choice = None
    while True:
        print(f"Current working directory: {Path.cwd()}")
        data_dir = Path(input(f"\nPlease enter the name of a directory in which to store your downloaded data."))
        if data_dir == Path('.'):
            continue
        if data_dir.is_dir():
            contents = data_dir.glob('*')
            if len(list(contents)) > 0:
                choice = osl.handle_old_data(data_dir)
                if choice == 1:
                    if data_dir.exists():
                        shutil.rmtree(data_dir)
                    data_dir.mkdir()
                    break
                elif choice == 2:
                    break
                else:
                    clear_output()
                    continue
            else:
                break
        else:
            data_dir.mkdir()
            break
else:
    print("Select your data directory")
    fc = FileChooser(Path.cwd())
    display(fc)

**Define absolute path to  analysis directory:**

In [None]:
if download:
    data_directory = Path.cwd()/data_dir
else:
    data_directory = Path(fc.selected_path)

print(f"data_directory: {data_directory}")

**Create a HyP3 object and authenticate:**

In [None]:
if download:
    hyp3 = HyP3(prompt=True)

**Decide whether to search for a HyP3 project or jobs unattached to a project:** 

In [None]:
if download: 
    options = ['project', 'projectless jobs']
    search_type = osl.select_parameter(options, '')
    print("Select whether to search for HyP3 Project or HyP3 Jobs unattached to a project")
    display(search_type)

**List projects containing active products of the type chosen in the previous cell and select one:**

In [None]:
if download:
    my_hyp3_info = hyp3.my_info()
    active_projects = dict()

    if search_type.value == 'project':
        for project in my_hyp3_info['job_names']:
            batch = Batch()
            batch = hyp3.find_jobs(name=project, job_type='INSAR_GAMMA').filter_jobs(running=False, include_expired=False)
            if len(batch) > 0:
                active_projects.update({batch.jobs[0].name: batch})

        if len(active_projects) > 0:
            display(Markdown("<text style='color:darkred;'>Note: After selecting a project, you must select the next cell before hitting the 'Run' button or typing Shift/Enter.</text>"))
            display(Markdown("<text style='color:darkred;'>Otherwise, you will rerun this code cell.</text>"))
            print('\nSelect a Project:')
            project_select = osl.select_parameter(active_projects)
            display(project_select)
    if search_type.value == 'projectless jobs' or len(active_projects) == 0:
        project_select = False
        if search_type.value == 'project':
            print(f"There were no {'INSAR_GAMMA'} jobs found in any current projects.\n")
        jobs = hyp3.find_jobs(job_type='RTC_GAMMA').filter_jobs(running=False, include_expired=False)
        orphaned_jobs = Batch()
        for j in jobs:
            if not j.name:
                orphaned_jobs += j
        jobs = orphaned_jobs

        if len(jobs) > 0:
            print(f"Found {len(jobs)} {'INSAR_GAMMA'} jobs that are not part of a project.")
            print(f"Select the jobs you wish to download")
            jobs = {i.files[0]['filename']: i for i in jobs}
            jobs_select = osl.select_mult_parameters(jobs, '', width='500px')
            display(jobs_select)
        else:
            print(f"There were no {'INSAR_GAMMA'} jobs found that are not part of a project either.")

**Select a date range of products to download:**

In [None]:
if download:
    if project_select:
        batch = project_select.value
    else:
        batch = Batch()
        for j in jobs_select.value:
            batch += j
    display(Markdown("<text style='color:darkred;'>Note: After selecting a date range, you should select the next cell before hitting the 'Run' button or typing Shift/Enter.</text>"))
    display(Markdown("<text style='color:darkred;'>Otherwise, you may simply rerun this code cell.</text>"))
    print('\nSelect a Date Range:')
    dates = osl.get_job_dates(batch)
    date_picker = osl.gui_date_picker(dates)
    display(date_picker)

**Save the selected date range and remove products falling outside of it:**

In [None]:
if download:
    date_range = osl.get_slider_vals(date_picker)
    date_range[0] = date_range[0].date()
    date_range[1] = date_range[1].date()
    print(f"Date Range: {str(date_range[0])} to {str(date_range[1])}")
    batch = osl.filter_jobs_by_date(batch, date_range)

**Gather the available paths and orbit directions for the remaining products:**

In [None]:
if download:
    display(Markdown("<text style='color:darkred;'><text style='font-size:150%;'>This may take some time for projects containing many jobs...</text></text>"))
    osl.set_paths_orbits(batch)
    paths = set()
    orbit_directions = set()
    for p in batch:
        paths.add(p.path)
        orbit_directions.add(p.orbit_direction)
    paths.add('All Paths')
    display(Markdown(f"<text style=color:blue><text style='font-size:175%;'>Done.</text></text>"))

---
**Select a path or paths (use shift or ctrl to select multiple paths):**

In [None]:
if download:
    display(Markdown("<text style='color:darkred;'>Note: After selecting a path, you must select the next cell before hitting the 'Run' button or typing Shift/Enter.</text>"))
    display(Markdown("<text style='color:darkred;'>Otherwise, you will simply rerun this code cell.</text>"))
    print('\nSelect a Path:')
    path_choice = osl.select_mult_parameters(paths)
    display(path_choice)

**Save the selected flight path/s:**

In [None]:
if download:
    flight_path = path_choice.value
    if flight_path:
        if flight_path:
            print(f"Flight Path: {flight_path}")
        else:
            print('Flight Path: All Paths')
    else:
        print("WARNING: You must select a flight path in the previous cell, then rerun this cell.")

**Select an orbit direction:**

In [None]:
if download:
    if len(orbit_directions) > 1:
        display(Markdown("<text style='color:red;'>Note: After selecting a flight direction, you must select the next cell before hitting the 'Run' button or typing Shift/Enter.</text>"))
        display(Markdown("<text style='color:red;'>Otherwise, you will simply rerun this code cell.</text>"))
    print('\nSelect a Flight Direction:')
    direction_choice = osl.select_parameter(orbit_directions, 'Direction:')
    display(direction_choice)

**Save the selected orbit direction:**

In [None]:
if download:
    direction = direction_choice.value
    print(f"Orbit Direction: {direction}")

**Filter jobs by path and orbit direction:**

In [None]:
if download:
    batch = osl.filter_jobs_by_path(batch, flight_path)
    batch = osl.filter_jobs_by_orbit(batch, direction)
    print(f"There are {len(batch)} products to download.")

**Download the products, unzip them into a directory named after the product type, and delete the zip files:**

In [None]:
if download:
    print(f"\nProject: {batch.jobs[0].name}")
    project_zips = batch.download_files(data_directory)
    for z in project_zips:
        osl.asf_unzip(str(data_directory), str(z))
        z.unlink()

**Collect the paths to the GeoTiffs**

In [None]:
product_paths = list(data_directory.glob('*/*.tif'))
# product_paths

---
## 2. Fix multiple UTM Zone-related issues

Fix multiple UTM Zone-related issues should they exist in your data set. If multiple UTM zones are found, the following code cells will identify the predominant UTM zone and reproject the rest into that zone. This step must be completed prior to merging frames or performing any analysis. AutoRIFT products do not come with projection metadata and so will not be reprojected.

**Use gdal.Info to determine the UTM definition types and zones in each product:**

In [None]:
utm_zones = []
utm_types = []
print('Checking UTM Zones in the data stack ...\n')
for k in range(0, len(product_paths)):
    info = (gdal.Info(str(product_paths[k]), options = ['-json']))
    info = json.dumps(info)
    info = (json.loads(info))['coordinateSystem']['wkt']
    zone = info.split('ID')[-1].split(',')[1][0:-2]
    utm_zones.append(zone)
    typ = info.split('ID')[-1].split('"')[1]
    utm_types.append(typ)
print(f"UTM Zones:\n {utm_zones}\n")
print(f"UTM Types:\n {utm_types}")

**Identify the most commonly used UTM Zone in the data:**

In [None]:
utm_unique, counts = np.unique(utm_zones, return_counts=True)
a = np.where(counts == np.max(counts))
predominant_utm = utm_unique[a][0]
print(f"Predominant UTM Zone: {predominant_utm}")

**Reproject all tiffs to the predominate UTM:**

In [None]:
reproject_indicies = [i for i, j in enumerate(utm_zones) if j != predominant_utm] #makes list of indicies in utm_zones that need to be reprojected
print('--------------------------------------------')
print('Reprojecting %4.1f files' %(len(reproject_indicies)))
print('--------------------------------------------')
for k in reproject_indicies:
    temppath = f"{str(product_paths[k].parent)}/r{product_paths[k].name}"
    print(temppath)  

    cmd = f"gdalwarp -overwrite {product_paths[k]} {temppath} -s_srs {utm_types[k]}:{utm_zones[k]} -t_srs EPSG:{predominant_utm}"
#     print(cmd)
    !{cmd}

    product_paths[k].unlink()

In [None]:
product_paths = list(data_directory.glob('*/*.tif'))
product_paths.sort()
for p in product_paths:
    print(p)

In [None]:
# TODO: Tidy these functions up, add type hints and doc strings, add to opensarlab-lib

def dates_from_product_name(product_name: Union[str, Path]) -> Union[str, None]:
    """
    Takes: a string or posix path to a HyP3 product
    Returns: a string date and timestamp parsed from the name or None if none found
    """
    regex = "[0-9]{8}T[0-9]{6}_[0-9]{8}T[0-9]{6}"
    results = re.search(regex, str(product_name))
    if results:
        return results.group(0)
    else:
        return None
    
def get_hyp3_log_val(log_path, regex):
    with open(str(log_path), 'r') as f:
        lines = f.readlines()
        for line in lines:
            val = re.search(regex, line)
            if val:
                return val.group(0)
            
def get_beam_IDs(log_path, regex):
    with open(str(log_path), 'r') as f:
        ids = None
        lines = f.readlines()
        for line in lines:
            val = re.search(regex, line)
            if val and not ids:
                ids = val.group(0)
            elif val and ids:
                ids = f'{ids}, {val.group(0)}'
    return ids
            
def mission_from_filename(product_name: Union[str, Path]) -> Union[str, None]:
    regex = 'S1(A|B|C)'
    results = re.search(regex, Path(product_name).name)
    if results:
        return results.group(0)
    else:
        return None
            
def observation_mode_from_filename(product_name: Union[str, Path]) -> Union[str, None]:
    return Path(product_name).name.split('_')[1]

def orbit_data_source_from_filename(product_name: Union[str, Path]) -> Union[str, None]:
    orbit =  Path(product_name).name.split('_')[3][2]
    if orbit == 'P':
        return 'Precise'
    elif orbit == 'R':
        return 'Restituted'
    elif orbit == 'O':
        return 'Original Predicted'
    
def get_corners_gdal(file):
    ds=gdal.Open(str(file))
    transform = ds.GetGeoTransform()
    x = ds.RasterXSize
    y = ds.RasterYSize
    
    ulx = transform[0]
    uly = transform[3]
    lrx = transform[0] + x * transform[1]
    lry = transform[3] + y * transform[5]
    
    return {'ul': [ulx, uly], 'lr': [lrx, lry]}
    
def parse_proj_crs(proj_crs):
    crs = pycrs.parse.from_ogc_wkt(proj_crs)
    cfg_p = {}
    cfg_p['grid_mapping_name'] = crs.name
    cfg_p['crs_wkt'] = crs.proj.name.ogc_wkt.lower()

    # Is there a better way to do this? 
    for p in crs.params:
        if isinstance(p,pycrs.elements.parameters.LatitudeOrigin):
            cfg_p['latitude_of_projection_origin'] = p.value
        if isinstance(p,pycrs.elements.parameters.CentralMeridian):
            cfg_p['longitude_of_central_meridian'] = p.value
        if isinstance(p,pycrs.elements.parameters.FalseEasting):
            cfg_p['false_easting'] = p.value
        if isinstance(p,pycrs.elements.parameters.FalseNorthing):
            cfg_p['false_northing'] = p.value
        if isinstance(p,pycrs.elements.parameters.ScalingFactor):
            cfg_p['scale_factor_at_centeral_meridian'] = p.value

    cfg_p['projected_coordinate_system_name'] = crs.name
    cfg_p['geographic_coordinate_system_name'] = crs.geogcs.name
    cfg_p['horizontal_datum_name'] = crs.geogcs.datum.name.ogc_wkt
    cfg_p['reference_ellipsoid_name'] = crs.geogcs.datum.ellips.name.ogc_wkt
    cfg_p['semi_major_axis'] = crs.geogcs.datum.ellips.semimaj_ax.value
    cfg_p['inverse_flattening'] = crs.geogcs.datum.ellips.inv_flat.value
    cfg_p['longitude_of_prime_meridian'] = crs.geogcs.prime_mer.value
    cfg_p['units'] = crs.unit.unitname.ogc_wkt
    cfg_p['projection_x_coordinate'] = "x"
    cfg_p['projection_y_coordinate'] = "y"

    return cfg_p

def get_InSAR_prod_id(tiff):
    stem = Path(tiff).stem
    regex = "(?<=_G_[u|w][e|c][1|2|3|4|F]_)\w{4}(?=_)"
    p_hash = re.search(regex, str(stem))
    if p_hash:
        return (p_hash.group(0))
    
def get_insar_product_type_from_filename(path):
    if re.search('\w+_amp\w*.tif', str(path)):
        p_type = 'amp'    
    elif re.search('\w+_corr\w*.tif', str(path)):
        p_type = 'corr'
    elif re.search('\w+_dem\w*.tif', str(path)):
        p_type = 'dem'
    elif re.search('\w+_lv_phi\w*.tif', str(path)):
        p_type = 'lv_phi'
    elif re.search('\w+_lv_theta\w*.tif', str(path)):
        p_type = 'lv_theta'
    elif re.search('\w+_unw_phase\w*.tif', str(path)):
        p_type = 'unw_phase'
    elif re.search('\w+_water_mask\w*.tif', str(path)):
        p_type = 'water_mask'
    else:
        p_type = None
    return p_type

## 3. Create an xarray Dataset containing the RTC stack

### 3.1. Write Functions to gather the needed CARD4L metadata from the backscatter GeoTiffs and log files

**Write a function to gather the metadata needed for each dual-pol pair:** 

In [None]:
def get_per_product_InSAR_vars(tiff):
    f = gdal.Open(str(tiff))
    info = gdal.Info(str(tiff), format='json')
    
    source_file_name = Path(info['files'][0]).name
    
    prod_id = get_InSAR_prod_id(tiff)
    
    try:
        metadata_path = list(Path(tiff).parent.glob(f'*_{prod_id}.txt'))[0]
    except:
        print("Metadata file not found")
        raise
        
    with open(metadata_path, 'r') as f:
        lines = f.readlines()
        metadata = dict()
        for l in lines:
            kv = l.strip().split(':')
            metadata.update(json.loads(f'{{"{kv[0]}": "{kv[1].strip()}"}}'))

    try:
        readme_path = list(Path(tiff).parent.glob('*.README.md.txt'))[0]
    except:
        print("README not found")
        raise
        
    date_regex = '[0-9]{2}/[0-9]{2}/[0-9]{4} [0-9]{2}:[0-9]{2}:[0-9]{2} [A|P]M'
    
    speckle = metadata['Speckle filter'].lower() == 'yes'
    range_bandpass_filter = metadata['Range bandpass filter'].lower() == 'yes'
    azimuth_bandpass_filter = metadata['Azimuth bandpass filter'].lower() == 'yes'
    
    ref_gran = metadata['Reference Granule']
    ref_metadata = asf_search.granule_search(ref_gran).geojson()
    
    var = {
        'Pedigree': {
            'ReferenceGranule': ref_gran,
            'SecondaryGranule': metadata['Secondary Granule'],
            'ProductName': '_'.join(Path(tiff).stem.split('_')[:-1]),
        },
        'SensorParameters': {
            'Sensor': ref_metadata['features'][1]['properties']['sensor'],
            'RadarFrequency': '',
            'ADCSamplingRate': '',
            'ChirpBandwidth': '',
            'PRF': '',
            'AzimuthProcBandwidth': '',
            'RangePixelSpacing': '',
        },
        'SceneParameters': {
            'Date': '',
            'StartTime': '',
            'CenterTime': '',
            'EndTime': '',
            'AzimuthLineTime': '',
            'RangeSamples': '',
            'AzimuthLines': '',
            'CenterLatitude': '',
            'CenterLongitude': '',
            'Heading': float(metadata['Heading']),
            'AzimuthPixelSpacing': '',
            'NearRangeSLC': '',
            'CenterRangeSLC': '',
            'FarRangeSLC': '',
            'IncidenceAngle': '',
            'SARToEarthCenter': '',
            'EarthRadiusAtNadir': float(metadata['Earth radius at nadir']),
            'EarthSemiMajor': '',
            'EarthSemiMinor': '',
            'PerpendicularBaseline': float(metadata['Baseline']),
        },
        
        'StateVectors': {
            'StateVectorQuantity': '',
            'TimeFirstStateVector': '',
            'StateVectorInterval': '',
            'StateVectorPosition1': '',
            'StateVectorVelocity1': '',
            
        },
        'ProcessingParameters': {
            'DopplerPolynomial': '',
            'ZeroDoppler': '',
            'AzimuthLooks': int(metadata['Azimuth looks']),
            'RangeLooks': int(metadata['Range looks']),
            'PhaseFilter': metadata['INSAR phase filter'],
            'PhaseFilterParameter': float(metadata['Phase filter parameter']),
            'PixelSpacing': 20 * int(metadata['Azimuth looks']),
            'RangeBandpassFilter': range_bandpass_filter,
            'AzimuthBandpassFilter': azimuth_bandpass_filter,
            'DEMSource': metadata['DEM source'],
            'DEMResolution': int(metadata['DEM resolution (m)']),
            'UTCTime': float(metadata['UTC time']),
            'UnwrappingType': metadata['Unwrapping type'],
            'UnwrappingThreshold': metadata['Unwrapping threshold'],
            'SpeckleFilter': speckle,
            
        },
    }
    
    return var



In [None]:
d = get_per_product_InSAR_vars(product_paths[0])
print(yaml.dump(d, default_flow_style=False))

In [None]:
def get_backscatter_stack_attrs(tiff):
    f = gdal.Open(str(tiff))
    info = gdal.Info(str(tiff), format='json')
    
    source_file_name = Path(info['files'][0]).name
    
    mission = mission_from_filename(source_file_name)
    polarization = polarization_from_filename(source_file_name)
    
    try:
        log_path = list(Path(tiff).parent.glob('*.log'))[0]
    except:
        print("Log file not found")
        raise
        
    date_regex = '[0-9]{2}/[0-9]{2}/[0-9]{4} [0-9]{2}:[0-9]{2}:[0-9]{2} [A|P]M'
    
    observation_mode = observation_mode_from_filename(source_file_name)
    
    pulse_rep_freq_regex = f'(?<={date_regex} - INFO - Proc: effective PRF derived from azimuth).*: [0-9]{{,5}}\.[0-9]{{,10}}(?=\n)'
    pulse_rep_freq = get_hyp3_log_val(log_path, pulse_rep_freq_regex)
    try:
        pulse_rep_freq = float(pulse_rep_freq.split(' ')[-1])
    except: 
        print("No valid Pulse Repetition Frequency found") 
        raise
    
    projection = f.GetProjection()
    epsg = int(projection.split('AUTHORITY["EPSG","')[-1].split('"')[0])
    
    if epsg == 4326:
        y = 'Latitude'
        x = 'Longitude'
    else:
        y = 'Northing'
        x = 'Easting'
        
    azimuth_regex = '(?<=(azimuth angle: ))[0-9]{,4}\.[0-9]{,6} \((right|left) looking\)'
    azimuth = get_hyp3_log_val(log_path, azimuth_regex)
    antenna_pointing = azimuth.split(' ')[1][1:]
    
    beam_id_regex = f'(?<={date_regex} - INFO - Proc: sensor: {mission} {observation_mode} )\w{{,4}}(?= {polarization}\n)'
    beam_ids = get_beam_IDs(log_path, beam_id_regex)
        
    attrs =  {
        'CoordinateReferenceSystem': {
            'EPSG': epsg,
            'WKT': projection,
        },
        'DataCollectionTime': {
            'NumberOfAcquisitions': None,
            'FirstAcquisitionDate': None,
            'LastAcquisitionDate': None,
        },
        'PixelCoordinateConvention': f.GetMetadata()['AREA_OR_POINT'],
        'Product': 'Normalized Radar Backscatter (Radiometrically Terrain-Corrected)',
        'SourceAttributes': {
            'Instrument': 'C-SAR',
            'SourceDataAcquisitionParameters': {
                'RadarBand': 'C',
                'RadarCenterFrequency': pulse_rep_freq,
                'ObservationMode': observation_mode,
                'Polarizations': ' '.join(get_polarizations_in_dir(Path(tiff).parent)),
                'AntennaPointing': antenna_pointing,
                'BeamID': beam_ids,
            },
        },
    }
    return attrs

In [None]:
# Uncomment to print an example call to `get_backscatter_stack_attrs`

# d = get_backscatter_stack_attrs(product_paths[0])
# print(yaml.dump(d, default_flow_style=False))

**Write a function to create an xarray.Dataset for a dual-pol pair of RTCs**

In [None]:
def hyp3_mintpy_InSAR_to_xarray(insar_dir):
    
    insar_paths = list(Path(insar_dir).glob('*.tif'))
    insar_paths.sort()
    
    insar_vars = get_per_product_InSAR_vars(insar_paths[1])
    
    # put each product type in an ndarray
    for f in insar_paths:
        if '.dem' not in str(f):
            ds = gdal.Open(str(f))
            banddata = ds.GetRasterBand(1)
            data = banddata.ReadAsArray()
            prod_type = get_insar_product_type_from_filename(f)
            if prod_type:
                exec(f"{prod_type} = np.ma.masked_invalid(data, copy=True)")

    ds=gdal.Open(str(insar_paths[1]))
    
    # pixel resolution
    geo_trans = ds.GetGeoTransform()
    res_x = geo_trans[1]
    res_y = geo_trans[5]
    
    #get corner coords and extents
    corners = get_corners_gdal(insar_paths[0])
    x_extent = [corners['ul'][0], corners['lr'][0]]
    y_extent = [corners['ul'][1], corners['lr'][1]]

    # create x and y arrays based on extents and pixel resolution
    x_coords = np.arange(x_extent[0], x_extent[1], res_x)
    y_coords = np.arange(y_extent[0], y_extent[1], res_y)

    # create xarray dataset
    data_set = xr.Dataset(
        data_vars={
            'y': y_coords,
            'x': x_coords,
            'amp': (
                ('y', 'x'),
                locals()['amp'].filled(0.0),
            ),
            'corr': (
                ('y', 'x'),
                locals()['corr'].filled(0.0),
            ),
            'lv_phi': (
                ('y', 'x'),
                locals()['lv_phi'].filled(0.0),
            ),
            'lv_theta': (
                ('y', 'x'),
                locals()['lv_theta'].filled(0.0),
            ),
            'unw_phase': (
                ('y', 'x'),
                locals()['unw_phase'].filled(0.0),
            ),
            'water_mask': (
                ('y', 'x'),
                locals()['water_mask'].filled(0.0),
            ),            
        },
        attrs=None
    )

    # Set x and y coord attributes
    attrs_x = {
        'axis': 'X',
        'units': 'm',
        'standard_name': 'projection_x_coordinate',
        'long_name': 'Easting'
    }
    attrs_y = {
        'axis': 'Y',
        'units': 'm',
        'standard_name': 'projection_y_coordinate',
        'long_name': 'Northing'
    }
    for key in attrs_x:
        data_set.x.attrs[key] = attrs_x[key]
    for key in attrs_y:
        data_set.y.attrs[key] = attrs_y[key]  
        
    for k in insar_vars.keys():
        data_set[k] = json.dumps(insar_vars[k])

    return data_set

In [None]:
def backscatter_pair_to_xarray(polar_pair):   
    bsv_gen = get_per_pair_RTC_vars(polar_pair[0])
    
    polarizations = get_polarizations_in_dir(Path(polar_pair[0]).parent)
    
    for tiff in polar_pair:
        bsv = get_per_pol_RTC_vars(tiff)
        ds=gdal.Open(str(tiff))
        polarization = bsv['BackscatterMeasurementData']['Polarization']
        banddata = ds.GetRasterBand(1)
        data = banddata.ReadAsArray()
        
        if polarization == 'VH' or polarization == 'HH':
            p1_backscatter = np.ma.masked_invalid(data, copy=True)
            p1_BackscatterMeasurementData = json.dumps(bsv['BackscatterMeasurementData'])
            p1_DocumentIdentifier = bsv['DocumentIdentifier']
            
        elif polarization == 'VV' or polarization == 'HV':
            p2_backscatter = np.ma.masked_invalid(data, copy=True)
            p2_BackscatterMeasurementData = json.dumps(bsv['BackscatterMeasurementData'])
            p2_DocumentIdentifier = bsv['DocumentIdentifier']

 
    ds=gdal.Open(str(polar_pair[0]))

    # get coordinate system projection
    prj = ds.GetProjection()
    crs = pycrs.parse.from_ogc_wkt(prj)
    crs_proj = crs.proj.name.ogc_wkt.lower()   

    # pixel resolution
    geo_trans = ds.GetGeoTransform()
    res_x = geo_trans[1]
    res_y = geo_trans[5]
    
    #get corner coords and extents
    corners = get_corners_gdal(polar_pair[0])
    x_extent = [corners['ul'][0], corners['lr'][0]]
    y_extent = [corners['ul'][1], corners['lr'][1]]

    # create x and y arrays based on extents and pixel resolution
    x_coords = np.arange(x_extent[0], x_extent[1], res_x)
    y_coords = np.arange(y_extent[0], y_extent[1], res_y)

    # create xarray dataset
    data_set = xr.Dataset(
        data_vars={
            'y': y_coords,
            'x': x_coords,
            f'{polarizations[0]}_backscatter': (
                ('y', 'x'),
                p1_backscatter.filled(0.0),
            ),
            f'{polarizations[1]}_backscatter': (
                ('y', 'x'),
                p2_backscatter.filled(0.0),
            ),
            f'{polarizations[0]}_BackscatterMeasurementData': p1_BackscatterMeasurementData,
            f'{polarizations[1]}_BackscatterMeasurementData': p2_BackscatterMeasurementData,
            f'{polarizations[0]}_DocumentIdentifier': p1_DocumentIdentifier,
            f'{polarizations[1]}_DocumentIdentifier': p2_DocumentIdentifier,
        },
        attrs=None
    )

    # Set x and y coord attributes
    attrs_x = {
        'axis': 'X',
        'units': 'm',
        'standard_name': 'projection_x_coordinate',
        'long_name': 'Easting'
    }
    attrs_y = {
        'axis': 'Y',
        'units': 'm',
        'standard_name': 'projection_y_coordinate',
        'long_name': 'Northing'
    }
    for key in attrs_x:
        data_set.x.attrs[key] = attrs_x[key]
    for key in attrs_y:
        data_set.y.attrs[key] = attrs_y[key]
        
    for k in bsv_gen.keys():
        data_set[k] = json.dumps(bsv_gen[k])
    
    return data_set

### 3.2. Create the xarray stack

**Create a list of paths to each InSAR directory**

In [None]:
def sort_by_date(a):
    return osl.date_from_product_name(a)

insar_dirs = [p for p in list(data_directory.glob('*')) if p.is_dir() and '.' not in str(p)]
insar_dirs.sort(key=sort_by_date)

**Create a list of xarray.Datsets for each InSAR product**

In [None]:
insar_arrays = []
for d in tqdm(insar_dirs):
    insar_arrays.append(hyp3_mintpy_InSAR_to_xarray(d))

In [None]:
insar_arrays[0]

**Prepare an `xarray.Dataset` to hold the stack**

- Contains the same x/y dimensions as stack data
- Contains a time dimension for the timesteps

In [None]:
times = [dates_from_product_name(d) for d in insar_dirs]

stack = insar_arrays[0]
variables = list(stack.variables)

for v in variables:
    if v not in ['x', 'y']:
        stack = stack.drop_vars(v)
        
stack = stack.assign_coords(time=times)
stack.time.attrs['axis'] = "T" 
stack.time.attrs['units'] = f"timestamps in format %Y%m%dT%H%M%S_%Y%m%dT%H%M%S"
stack.time.attrs['calendar'] = "proleptic_gregorian"
stack.time.attrs['long_name'] = "Time" 
stack.time.attrs['description'] = "<reference scene acquisition time>_<secondary scene acquisition time>" 

print("Take a look at the stack.")
print("It should contain 'x', 'y', and 'time' Dimensions but no Attributes or Data Variables yet.")
stack

**Populate the stack Variables and Attributes with**

- `xarray.DataArrays` for each amp, corr, lv_phi, lv_theta, unw_phase, and water mask array
- `xarray.DataArray` for a single DEM
- `xarray.Variables` holding InSAR product level metadata
- `xarray.Dataset.attrs` holding stack level metadata

In [None]:
for v in variables:
    if v not in ['x', 'y']:
        xarray_3d = xr.concat([d[v] for d in insar_arrays], dim=stack.time)
        stack[v] = xarray_3d
        for d in insar_arrays:
            d = d.drop_vars(v)

for p in product_paths:
    if 'dem' in str(p):
        dem = p
        break
ds = gdal.Open(str(dem))
banddata = ds.GetRasterBand(1)
data = banddata.ReadAsArray()
stack['dem'] = (('y', 'x'), np.ma.masked_invalid(data, copy=True))

            
attrs = {
    'DataCollectionTime': {
        'NumberOfAcquisitions': len(times),
        'FirstAcquisitionDate': times[0],
        'LastAcquisitionDate': times[1],
    },
}
attrs = {k: (v if type(v) != dict else json.dumps(v)) for k, v in attrs.items()}
stack = stack.assign_attrs(attrs)

stack

### 3.3. Write the stack to a NetCDF and/or Zarr store

**Save the time-series stack as a NetCDF**

In [None]:
%df
print("\nNoting your current available storage (shown above), do you wish to save the time-series as a NetCDF?")
netcdf = osl.select_parameter(['No, do NOT save a NetCDF', 'Yes, save a NetCDF'])
display(netcdf)

In [None]:
start = json.loads(stack.DataCollectionTime)['FirstAcquisitionDate']
end = json.loads(stack.DataCollectionTime)['LastAcquisitionDate']

if 'Yes' in netcdf.value:
    netcdf_path = data_directory/f"InSAR_stack_{start}__{end}.nc4"
    stack.to_netcdf(netcdf_path, engine='netcdf4')

**Save the time-series stack as a local or remote Zarr-store. Chunk it for temporally optimized access, spatially optimized access, or both**

In [None]:
print("Do you wish to save the time-series as a local or remote Zarr-Store?")
locale = osl.select_parameter(['Local Zarr store (on my volume)', 'Remote Zarr store (in an S3 bucket)'])
display(locale)

In [None]:
print("Do you wish to save a temporally optimized Zarr-store, a spatially optiimized Zarr-store, or both?")
opt = osl.select_parameter(['Temporally Optimized', 'Spatially Optimized', 'Both'])
display(opt)

In [None]:
def calc_chunks(stack: xr.Dataset, chunk_size=100, pol_count=2):
    """
    stack: the xr.Dataset for which to determine chunks
    chunk_size: int in MB
    pol_count: number of polarizations
    """
    chunks = list()
    bits_per_mb = 8000000
    bits_per_chunk = bits_per_mb * chunk_size
    bits_per_pixel = 32
    pixels_per_chunk = bits_per_chunk / bits_per_pixel
    depth = len(stack.time)
    
    temp_op_xy_pixels = pixels_per_chunk // (depth * pol_count)
    spatial_op_xy_pixels = pixels_per_chunk // pol_count
    
    temp_x_y_side = math.floor(math.sqrt(temp_op_xy_pixels))
    spatial_x_y_side = math.floor(math.sqrt(spatial_op_xy_pixels))
    
    return {
        'temporal': (depth, temp_x_y_side, temp_x_y_side),
        'spatial': (1, spatial_x_y_side, spatial_x_y_side)
    }
    

In [None]:
calc_chunks(stack)

**Write the Zarr store with a group for each selected chunking scheme**

In [None]:
# TODO handle bad s3 paths

if 'Local' in locale.value:
    store = data_directory/f"InSAR_stack_{start}__{end}.zarr"
else:
    s3_path = input("Enter the S3 path to your store")
    s3 = s3fs.S3FileSystem(anon=True)
    store = s3fs.S3Map(root=s3_path, s3=s3, check=False)

compressor = zarr.Blosc(cname='zstd', clevel=3)
chunks = calc_chunks(stack)
if 'Temporally' in opt.value or 'Both' in opt.value:
    encoding = {vname: {'compressor': compressor, 'chunks': chunks['temporal']} for vname in stack.data_vars}
    stack.to_zarr(store=store, encoding=encoding, consolidated=True, group='temporally_optimized', mode='w')

if 'Spatially' in opt.value or 'Both' in opt.value:
    encoding = {vname: {'compressor': compressor, 'chunks': chunks['spatial']} for vname in stack.data_vars}
    stack.to_zarr(store=store, encoding=encoding, consolidated=True, group='spatially_optimized', mode='w')

*Prepare_Hyp3_RTC_TimeSeries_NetCDF_Zarr.ipynb - Version 0.1.0 - March 2021*